Conformal predictions for information fusion

Ann Math Artif IntellDOI 10.1007/s10472-013-9392-4

Conformal predictions for information fusionA comparative study of p-value combination methods

Vineeth N. Balasubramanian ·Shayok Chakraborty ·Sethuraman Panchanathan

© Springer Science+Business Media Dordrecht 2014

Abstract The increased availability of a wide range of sensing technologies over the lastfew decades has resulted in an equivalent increased need for reliable information fusionmethods in machine learning applications. While existing theories such as the Dempster-Shafer theory and the possibility theory have been used for several years now, they do notprovide guarantees of error calibration in information fusion settings. The Conformal Pre-dictions (CP) framework is a new game-theoretic approach to reliable machine learning,which provides a methodology to obtain error calibration under classification and regres-sion settings. In this work, we present a methodology to extend the Conformal Predictionsframework to both classification and regression-based information fusion settings. Thismethodology is based on applying the CP framework to each data source as an independenthypothesis test, and subsequently using p-value combination methods as a test statistic forthe combined hypothesis after fusion. The proposed methodology was studied in classifica-tion and regression settings within two real-world application contexts: person recognitionusing multiple modalities (classification), and head pose estimation using multiple imagefeatures (regression). Our experimental results showed that quantile methods of combin-ing p-values (such as the Standard Normal Function and the Non-conformity Aggregationmethods) provided the most statistically valid calibration results, and can be considered toextend the CP framework for information fusion settings.

V. N. Balasubramanian (�)Indian Institute of Technology, Hyderabad, Indiae-mail: [email protected]

S. ChakrabortyIntel Research Labs, Portland, OR, USA

S. PanchanathanCenter for Cognitive Ubiquitous Computing, Arizona State University,699 S Mill Avenue, Tempe, AZ 85287, USA

mailto:[email protected]

V. N. Balasubramanian et al.

Keywords Conformal predictors · Information fusion · Multiple hypothesis testing · Faceprocessing applications

Mathematics Subject Classification 68T10: Pattern recognition · Speech recognition

1 Introduction

The rapid growth in advancements in sensor technologies and their miniaturization hasresulted in the widespread use of multiple data sources (such as cameras, microphones,accelerometers, GPS devices, gyroscopes, magnetometers, and RFID devices) to captureand understand people, objects and activities. This has resulted in an increased need inmachine learning applications for information fusion methods that can merge informa-tion from disparate data sources (with potentially differing conceptual and contextualrepresentations) to provide a reliable prediction for the entity under question. Existinginformation fusion methods have been based on underlying theories such as the Dempster-Shafer theory [55], Bayesian theory [17], Possibility theory [18], Fuzzy integrals [8],MYCIN uncertainty factors [13], DSmT combination [25], Belief functions theory [27],and the GESTALT system [41]. Rogova and Nimier [53] categorized uncertainty estimationframeworks commonly employed as combinatorial functions in fusion systems into:

– Bayesian methods, which include probabilistic methods that use the prior probabil-ity, likelihood and posterior probabilities in the system. Examples of methods includeBayesian fusion rules, weighted average methods and incorporation of contextualinformation.

– Evidential methods, which include evidence aggregration rules, such as the Dempster-Shafer theory of evidence [55], and the transferable belief model [27].

– Possibility and fuzzy methods, where combination rules are based on t-norms and t-conorms (the fuzzy translation of intersection and union), such as the possibility theory[18].

While the aforementioned methods have been extensively used over the last several years,it is still not possible to associate desired properties of a confidence measure such asvalidity/calibration (or in some cases, generalizability to all classification and regressionmethods) in information fusion settings. The Conformal Predictions (CP) framework [60]is a new game-theoretic approach to reliable machine learning, which provides a method-ology to obtain error calibration in the online setting applicable to both classification andregression contexts. In this work, we propose a methodology to extend the CP frame-work to information fusion settings by combining p-values obtained from the individualdata sources. We empirically investigate the validity and efficiency of different methods ofcombining p-values (assuming their independence) on real-world classification regressionproblems. While the CP framework has been extended to other machine learning settings inthe recent past, including active learning, change detection, anomaly detection and qualityassessment, this is the first effort, to the best of our knowledge, on the application of thisframework to an information fusion setting, and the subsequent study of the error calibrationachieved by combining p-values obtained by using the framework on multiple data sources.

The remainder of the article is organized as follows. Section 2 presents a review ofconformal predictors in classification and regression settings, and the rationale behind our

Conformal predictions for information fusion

approach in this work. Section 3 presents the proposed methodology to extend the Confor-mal Predictions framework to information fusion settings. Section 4 presents the results ofapplying the proposed methodology in a classification setting on a person recognition prob-lem (fusion of multiple modalities), and in a regression setting on the head pose estimationproblem (fusion of multiple data features). We conclude with an analysis of our results andpointers to future work in Section 5.

2 Background and rationale

2.1 Conformal predictors in classification and regression: a review

The theory of conformal predictions was developed by Vovk, Shafer and Gammerman [56,60] based on the principles of algorithmic randomness, transductive inference and hypoth-esis testing. This theory is based on the relationship derived between transductive inferenceand the Kolmogorov complexity [35] of an i.i.d. (identically independently distributed)sequence of data instances. Hypothesis testing is subsequently used to construct confor-mal prediction regions, and obtain reliable measures of confidence. The methodologies forapplying the Conformal Predictions (CP) in classification and regression settings, describedin [60], are briefly reviewed below.

2.1.1 Conformal predictors in classification

The CP framework brings together principles of hypothesis testing and traditional machinelearning algorithms through the definition of a non-conformity score, which is a measurethat quantifies the conformity of a data point to a particular class label, and is definedsuitably for each classifier. As an example, the non-conformity measure of a data point xifor a k-Nearest Neighbor classifier is defined as:

αyi =

∑kj=1 D

yij

∑kj=1 D

−yij

(1)

where Dyi denotes the list of sorted distances between a particular data point xi and other

data points with the same class label, and D−yi denotes the list of sorted distances between xi

and data points with any other class label. Dyij is the j th shortest distance in the list of sorted

distances, Dyi . Figure 1 illustrates the idea. Note that the higher the value of αy

i , the morenon-conformal the data point is with respect to the current class label i.e. the probability ofit belonging to other classes is high.

Given a new test data point, say xn+1, a null hypothesis is assumed that xn+1 belongs tothe class label, say, y(j). The non-conformity measures of all the data points in the systemx1, x2, . . . , xn+1 are computed assuming the null hypothesis is true. A p-value function isdefined as:

pj =count

{i ∈ {1, . . . , n+ 1} : αy(j)

i ≥ αy(j)

n+1

}

n+ 1(2)

where αy(j)

n+1 is the non-conformity measure of xn+1, assuming it is assigned the class labely(j). It is evident that the p-value is highest when all non-conformity measures of trainingdata belonging to class y(j) are higher than that of the new test point, xn+1, which points out


that xn+1 is most conformal to the class y(j). This process is repeated with the null hypoth-esis supporting each of the class labels, and the highest of the p-values is used to decide theactual class label assigned to xn+1, thus providing a transductive inferential procedure forclassification. If pj is the highest p-value and pk is the second highest p-value, then pj iscalled the credibility of the decision, and 1−pk is the confidence of the classifier in the deci-sion. Given a user-specified confidence level, ε, the output conformal prediction regions,

Algorithm 1 Conformal Predictors for Classification

Require: Training set T = {(x1, y1), ..., (xn, yn)} where xis are the data points and eachyi is the class label of the corresponding data point xi ; Number of classes M; Class labelsy(i) ∈ Y = {

y(1), y(2), . . . , y(M)}; Classifier �; Confidence level ε

1: Get new unlabeled example xn+1.2: for all class labels, y(j), where j = 1, . . . ,M do3: Assign label y(j) to xn+1.4: Update the classifier �, with T ∪ {xn+1, y

(j)}.5: Compute non-conformity measure value, αy(j)

i , i = 1, . . . , n + 1 to compute thep-value, pj , w.r.t. class y(j) (2).

6: end for7: Output the conformal prediction regions �ε = {y(j) : pj > 1− ε, y(j) ∈ Y }, where ε is

the confidence level.

�ε , contain all the class labels with a p-value greater than 1−ε. These regions are conformali.e. the confidence threshold, ε, directly translates to an upper bound on the frequency oferrors, given by 1 − ε, in the online setting [59] (if the correct label for a given data point isnot in the set of predicted class labels, this is considered an error). The methodology is sum-marized in Algorithm 1. As mentioned earlier, the CP framework can be used in associationwith any classifier, with the suitable definition of a non-conformity measure. Sample non-conformity measures for various classification algorithms can be found in [60]. In recentyears, the CP framework has been applied with various classification methods including k-

Fig. 1 An illustration of thenon-conformity measure definedfor k-NN


Nearest Neighbors [5], Support Vector Machines [4], neural networks [48], random forests[62] and evolutionary algorithms [33].

2.1.2 Conformal predictors in regression

The CP framework has also been used in regression formulations to deliver predictionregions that are calibrated [32, 46, 52, 60]. While the label space in a classification problemis a finite set, the label space in regression problems is continuous. This needs a differentmethodology of applying the framework, since it is not practical to hypothesize each valueon the real line as a possible class label, and compute a corresponding p-value. The algo-rithm to define conformal prediction regions for regression seeks to identify intervals (orneighborhoods) on the real line that conform to a pre-specified confidence level. A largerconfidence level (say ε1) will result in a larger prediction interval �ε1 , and a smaller con-fidence level (say ε2) will result in a narrower interval �ε2 . It should be noted here that�ε2 ⊆ �ε1 , as long as ε2 ≤ ε1. Given a new data point xn+1, an error is said to occurwhen the actual label yn+1 of xn+1 is not present in the output region(s) ∪i[yi , yi+1] (as inAlgorithm 2 Step 11).

In a regression problem, the non-conformity measure can be defined as the absolute valueof the difference between the actual value y and the predicted value, y [60]:

αi = |yi − yi | (3)

Papadopoulos et al. [32] also suggested a modified non-conformity measure where the pre-dicted accuracy of the decision rule f on a training set is used, i.e. the measure is definedas:

αi = |yi − yi |σi

(4)

where σi is an estimate of the accuracy of the decision rule f on xi .An efficient algorithm to compute conformal prediction intervals in the case of ridge

regression (regularized least squares regression) was proposed by Nouretdinov et al. [46],and is described below in Algorithm 2. Si in Algorithm 2 is given by the followingequation:

Si =

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎩

[ui, vi] if bn+1 > bi

(−∞, ui] ∪ [vi,∞) if bn+1 < bi

[ui,∞) if bn+1 = bi > 0 and an+1 < ai

(−∞, vi] if bn+1 = bi > 0 and an+1 > ai

� if bn+1 = bi = 0 and |an+1| ≤ |ai |∅ if bn+1 = bi = 0 and |an+1| > |ai |

(5)

Vovk et al. described the application of the CP framework to ridge regression, least-squares regression and nearest neighbors regression in [60]. Recent work in applying theCP framework for regression settings has included the definition of newer ways to computenon-conformity scores for nearest neighbors regression [49], as well as the application ofthe framework to new problems such as network intrusion prediction [16].

2.2 Rationale and motivation

The fusion of information from multiple sources can happen at different levels usingdifferent methods, as summarized in Fig. 2 [30]. Dasarathy [15] categorized these


Algorithm 2 Conformal Predictors for Ridge Regression

Require: Training set T = {(x1, y1), ..., (xn, yn)}; New example xn+1; Confidence level ε;Matrix X = (x1, x2, . . . , xn+1); Regularization parameter α

1: Calculate C = I −X(X′X + αI)−1X′2: Let A = C(y1, y2, . . . , yn, 0)′ = (a1, a2, . . . , an+1)

3: Let B = C(0,0, . . . , 0, 1)′ = (b1, b2, . . . , bn+1)

4: for i = 1 to n+ 1 do5: Calculate ui and vi as below.

If bi �= bn+1, then ui = min(ai−an+1bn+1−bi

,−(ai+an+1)bn+1+bi

); vi = max

(ai−an+1bn+1−bi

,−(ai+an+1)bn+1+bi

)

If bi = bn+1, then ui = vi = −(ai+an+1)2bi

.6: end for7: for i = 1 to n+ 1 do8: Compute Si according to (5).9: end for

10: Sort (−∞, u1, u2, . . . , un+1, v1, . . . , vn+1,∞) in ascending order, obtainingy0, . . . , y2n+3

11: Output ∪i[yi , yi+1], such that N(yi )n+1 > 1 − ε, where N(yi) = count{Sj : [yi , yi+1] ⊆

Sj }, i = 0, . . . , 2n+ 2, and j = 1, . . . , n+ 1.

Fig. 2 An overview of information fusion methods with details of decision-level fusion methods (adaptedfrom [30])


approaches as data-level fusion (where data is combined), feature-level fusion (wherefeatures are extracted from the data in different modalities separately, and these fea-tures are then combined), and decision-level fusion (where the fusion happens at thedecision-making level). Data-level fusion and feature-level fusion are addressed togetheras early fusion, whereas decision-level fusion is also called late fusion. Based on thiscategorization of information fusion methods, we consider the possibilities in applying con-formal predictors to information fusion in both early fusion and late fusion, as describedbelow:

– Early fusion: The classifiers (or regressors) corresponding to the data sources arefirst combined using standard fusion techniques before the conformal predictionsframework is applied to the ensemble.

– Late fusion: Conformal predictions are obtained with each classifier (or regressor) cor-responding to a unique data source, and these predictions (including non-conformitymeasure values and p-values) are then combined in a second stage to obtain conformalpredictions at a decision level.

Evidently, the application of conformal predictors to the early fusion setting is straight-forward. Vovk et al. [60] suggested that a suitable non-conformity measure can be definedafter the outputs of each of the classifiers have been combined. For example, in the case ofan ensemble classifier such as boosting, the non-conformity measure can be defined as:

T∑

t=1

βtBt (x, y) (6)

where Bt(x, y) is the non-conformity score of (x, y) computed from a weak classifier, ht(which can be any classifier such as decision tree or k-Nearest Neighbor), and βt are theweights learnt by the boosting algorithm. This non-conformity measure can be directly usedin the CP framework to obtain calibrated conformal predictions.

However, the late fusion approach, which is the focus of this work, has not beenaddressed earlier, and is of value in risk-sensitive application contexts in information fusion.To illustrate the applicability of the late fusion scenario, let us consider the problem of mul-timodal person recognition, i.e., the task of recognizing the identity of an individual using,say, both face and speech data. In the early fusion case, the CP framework is applied tothe combined outputs from the face and speech classifiers. However, if the user would liketo understand which of the modalities resulted in errors (so that appropriate action can betaken, such as an additional training phase for that modality), it would be essential to havea measure of confidence for each of the modalities, and understand how they contributed tothe net confidence. In other words, confidence can be viewed as being computed at an entitylevel and at an attribute level in an information fusion context, where an entity such as a per-son is understood to be made up of several attributes such as face and speech. While the earlyfusion approach computes only the entity-level confidence, the late fusion approach canprovide an attribute-level confidence and an entity-level confidence, thus providing a highervalue to the end user. This approach has challenges, since an appropriate methodology tocombine the conformal predictions (or equivalently, non-conformity measures or p-values)needs to be identified. We now outline our approach to combine conformal predictors frommultiple classifiers (and regression methods) for information fusion.


3 Conformal predictors for information fusion: methodology

Our proposed methodology for applying conformal predictors in information fusion set-tings is premised on multiple hypothesis testing [57]. We propose that each data sourceconsidered for fusion can be formulated as an independent hypothesis test, and the p-valuesobtained from each hypothesis test can be combined using established statistical methods[38] (described later in this section). We now describe how this approach can be applied inclassification and regression settings.

Classification Given a new test data instance, the Conformal Predictions (CP) frameworkoutputs a p-value for every class label, as described in Section 2.1. When there are multi-ple data sources describing a single class label entity (e.g. different modalities like face andspeech or different feature spaces obtained from a single face image, for person recognition),we use a classsifier for each individual data source with appropriate non-conformity mea-sures, and obtain p-values for each class label uniquely for each data source. Thus, for everyclass label y(j), j ∈ {1, . . . ,M}, we have an individual null hypothesis for each data source,H01,H02, . . . ,H0N , where M is the number of class labels and N is the number of datasources. Thus, for every class label y(j), we obtain N p-values, p(i), i = 1, . . . , N (one foreach modality). These p-values are then combined into a new test statistic C(p(1), . . . , p(N))

(addressed as C in the remainder of this work), which is used to test the combined nullhypothesis H0 for class label y(j) (methods to compute C from the individual p-values arediscussed later in this section). We note that the combined null hypothesis, H0, is that eachof the individual null hypotheses (say H01,H02, . . . , H0N ) is true. The conformal predic-tion region at a specified confidence level, �ε , is then presented as a set containing all theclass labels with a p-value greater than 1 − ε.

Regression In the regression setting, for a given data source, the CP framework outputsa union of intervals for the output variable y (as in Algorithm 2 Step 11), and a p-valueis associated with each of these intervals. When there are N multiple data sources, weobtain sets of intervals, say, R(i), i = 1, . . . , N for y. To apply the CP framework in thissetting, we consider the set, I , of all non-empty intervals of the form I(1) ∩ I(2) ∩ · · · ∩ I(N)

where I(1) ∈ R(1), I(2) ∈ R(2), . . . , I(N) ∈ R(N). For each interval in I , we identify thecorresponding p-value for each data source, p(i), i = 1, . . . , N , and then combine them intoa new test statistic C(p(1), . . . , p(N)) (addressed as C in the remainder of this work), whichis then used to test the combined null hypothesis for that particular interval. Similar to theclassification setting, the conformal prediction region for a specified confidence level, �ε isthen presented as the union of intervals with a p-value greater than 1 − ε.

Our methodology for conformal predictors in information fusion is summarized inAlgorithms 3 and 4. Evidently, this methodology retains the inherent generalizability of theCP framework to all classification and regression methods, and is relevant to all applicationsthat require inference from multiple data sources.

3.1 Combining P-values from multiple hypothesis tests

Multiple hypothesis testing has been studied for several decades now, and some of the mostestablished methods that are used to combine p-values from multiple tests include Tippett’smethod [58], Fisher’s method [24], Wilkinson’s method [61], Liptak’s method [36],Lancaster’s method [34], Edgington’s method [20], Mudholkar and George’s method [43],and other weighted combination methods [28, 42]. Methods for multiple hypothesis testing


Algorithm 3 Conformal Predictors for Information Fusion (Classification)

Require: Number of data sources N ; Training sets for each data source T1 ={(x(1)1, y1), . . . , (x(1)n, yn)}, . . . , TN = {(x(N)1, y1), . . . , (x(N)n, yn)} where x(j)i is theith data point belonging to the j th data source and yi is the class label of the ith datapoint; Number of classes M; Class labels y(i) ∈ Y = {y(1), y(2), . . . , y(M)}; Classifiers�1, . . . , �N for each data source; Confidence level ε

1: Get the new unlabeled example w.r.t each data source x(1)n+1, . . . , x(N)n+1.2: Using Algorithm 1 and classifiers �1, . . . , �N corresponding to each data source,

compute p-values p(i)j , where i = 1, . . . , N corresponds to the ith data source andj = 1, . . . ,M corresponds to the j th class label.

3: for each class label, y(j), j = 1, . . . ,M do4: Compute p-value, pj of combined hypothesis from N modalities using the methods

described in Section 3.1.5: end for6: Output the conformal prediction regions �ε = {y(j) : pj > 1 − ε, y(j) ∈ Y }.

can broadly be categorized into dependent and independent tests. In this work, we assumethat the hypothesis tests across the different data sources are independent, and focus on suchmethods. We plan to consider dependent tests in our future work.

P-value combination methods for multiple independent hypothesis tests can be broadlycategorized into quantile combination methods and order statistic methods [38]. In quantilecombination methods, a relevant parametric Cumulative Distribution Function (CDF), F , isselected, and the p-values, pis, are transformed into distributional quantiles, qi = F−1(pi)

where i = 1, 2, . . . , N for each of the class labels. These qis are subsequently combined

Algorithm 4 Conformal Predictors for Information Fusion (Regression)

Require: Number of data sources N ; Training sets for each data source T1 ={(x(1)1, y1), . . . , (x(1)n, yn)}, . . . , TN = {(x(N)1, y1), . . . , (x(N)n, yn)} where x(j)i is theith data point belonging to the j th data source and yi is the class label of the ith datapoint; Regressors ϒ1, . . . , ϒN for each data source; Confidence level ε

1: Get the new unlabeled example w.r.t each data source x(1)n+1, . . . , x(N)n+1.2: Using Algorithm 2 and regressors ϒ1, . . . , ϒN corresponding to each data source,

compute a set of intervals R(i), i = 1, . . . , N for each data source, and the p-value

associated with each interval in R(i).(

Note that the p-value is given by the ratioN(yi )n+1

in Algorithm 2 Step 11)3: Compute the set, I , of all non-empty intervals of the form I(1) ∩ I(2) ∩ · · · ∩ I(N) where

I(1) ∈ R(1), I(2) ∈ R(2), . . . , I(N) ∈ R(N).4: for each interval Ij in I do5: Identify the p-value, p(i), i = 1, . . . , N corresponding to Ij in each data source

(as computed in Step 2), and compute p-value, pj of combined hypothesis from N

modalities for Ij using the methods described in Section 3.1.6: end for7: Output the conformal prediction regions �ε = ∪Ij∈I,pj>1−εIj .


as C = ∑i qi , and the p-value of the combined test H0 is computed from the sampling

distribution of C. Examples of CDFs used in these methods include chi-square [24, 34],standard normal [36], uniform [20] and logistic [43]. On the other hand, order statisticmethods use the fact that under the null hypothesis H0, the pis can be reordered as p(i)ssuch that p(1) ≤ p(2) ≤ · · · ≤ p(N) represent order statistics from a U(0, 1) distribution(Note that a p-value is assumed to be a uniformly distributed random variable on the interval[0, 1]). Then, a combining function C is defined as C = p(r) for r such that 1 ≤ r ≤ N .Common examples of order statistic methods are the minimum p-value (when r = 1 [58])and the maximum p-value (when r = N [38]).

Based on earlier work in combining p-values for independent tests, we employed threecategories of p-value combination methods in the validation of our methodology: quantilecombination methods, order statistic methods, and learning-based methods, as describedbelow.

Quantile Combination Methods We selected three kinds of quantile combination methodsfor our experimental studies:

– Standard Normal Function (SNF): In this approach, we compute the inverse of the nor-mal CDF using the p-values obtained from the individual classifiers and thus computeqi = F−1(pi) for i = 1, 2, . . . , N . C is then obtained as

∑i qi , and the normal CDF

is again used as the sampling distribution to compute the p-values at the fusion level.This was found to be the most suitable for general use in an earlier study [38].

– Non-conformity Aggregation (NCA): The non-conformity measure values computed inthe CP framework can be viewed as the ‘test statistic’ leading to the computation of thep-values for each class label. Hence, instead of assuming a quantile function, F−1, andthen computing the qi values, the non-conformity measures themselves can be used asthe qis. Similar to the previous approach, C is then obtained as

∑i qi , and the combined

C values are then used as non-conformity measures at the fusion level to compute the

p-values using the standard CP framework procedure(

(2) for classification, and N(yi )n+1

in Algorithm 2 Step 11 for regression)

.

– Extended Chi-Square Function (ECF): Fisher proposed the chi-square quantile combi-nation method to combine the p-values of independent tests in [24]. Jost [31] stated thatwhen Fisher’s derivation for the chi-square statistic is solved further analytically, theresult is the expression in (7) below, where k is the product of p-values across all datasources, and m is the number of p-values under consideration. For more details, pleasesee [6].

k

m−1∑

i=0

(− ln k)i

i! (7)

We call this the Extended Chi-Square Function (ECF) method in our work. The chi-square CDF was also recommended by Loughin in their study of such methods forgeneral use along with the standard normal function [38].

Order Statistic Methods We selected two established order statistic methods:

– Minimum Order Statistic (MIN): The minimum of the p-values corresponding to eachdata source, the 1st order statistic p(1), is used in this method. This method providedthe best results among order statistic methods in an earlier study [38] and hence, is usedin our work.


– Maximum Order Statistic (MAX): The maximum of the p-values corresponding to eachdata source, the largest order statistic, is used in this method. This method is understoodto perform well only when all null hypotheses are equally false; however, for the sakeof completeness, we include this method in our work.

Learning-Based Methods In a recent empirical study on image saliency detection,Ali et al. [1] used a hierarchical learning-based p-value combination method, where thep-values obtained using different image features are used as an input vector to a SupportVector Machine (SVM), on which the CP framework is subsequently used to obtain p-valuesat the fusion level. We include a similar approach in our validation:

– k-Nearest Neighbor (KNN): The p-values computed from the individual data sourcesare provided as input to a k-NN classifier, and the CP framework is applied to the k-NNto obtain the p-values at the fusion level.

The aforementioned 6 methods (SNF, NCA, ECF, MIN, MAX, KNN) are used in thiswork to combine the conformal predictors from individual classifiers and regressors. Ourexperimental results are presented in the following section.

4 Experiments and results

We validated the proposed methodology to extend conformal predictors to informationfusion settings on two real-world applications: multiple-modality fusion for person recogni-tion (classification setting), and multiple-feature fusion for head pose estimation (regressionsetting). The experiments and results obtained in each of these application domains aredescribed individually below. In all experiments in this work, the results are averaged over10 trials to address any randomness bias.

4.1 Classification: multi-modal fusion for person recognition

In the wave of growing concerns about security and privacy, the need to reliably estimatethe identity of an individual has become very pronounced. Biometric systems rely on theevidence provided by face, voice, fingerprint, signature and other modalities to verify andvalidate the identity claimed by an individual. While unimodal biometric systems (that relyon a single modality) can be limited by factors such as noisy data, environmental conditions(e.g., changes in ambient lighting) or spoof attacks, the use of multiple modalities for personrecognition can increase the viability of the system and the range of environments in whichit can operate. Besides, with the growing risk associated with misclassification in currenttimes, obtaining a measure of confidence with the output prediction is of immense value tothe user communities.

In this work, we focus our efforts on multimodal person recognition using face andspeech data. While several methods have been studied in the past for fusion of face andspeech for robust person recognition (summarized in Table 1), none of these methodsprovide calibrated measures of confidence in the predictions as obtained using the CPframework, thus necessitating this work for conformal predictors in multimodal fusionsettings.


Table 1 Summary of approaches in existing work towards fusion of face and speech-based personrecognition

[9, 19] Bayesian approach with SVMs

[51] Logical AND

[12] Weighted geometric average

[10, 14, 23, 47] Linear weighted summation

[22] Adaptive modality weighting model called Cumulative Ratio of Correct

Matches (CRCM)

[29] Modality weighting based on estimates of the probability density function

of scores under Gaussian assumption

[26] Cascaded approach where the outputs are weighted by the confidence scores

[39] Weighting modality scores where weight is proportional to recognition rate

4.1.1 Data setup

The VidTIMIT [54] and the MOBIO (Mobile Biometry)1 datasets are used to validate theproposed methodology. Both these databases contain frontal images of subjects under natu-ral conditions. The VidTIMIT dataset contains the video recordings of 43 subjects recitingshort sentences, with approximately 10 videos per subject. These videos contained between100-110 frames each, leading to a total of over 43000 data instances. More details of thedataset can be found in [54]. The MOBIO (Mobile Biometry) dataset was created for theMOBIO challenge to test the performance of state-of-the-art face and speech recognitionalgorithms. It contains videos of 160 subjects captured using a mobile phone camera underchallenging real world conditions, with 5 videos for each subject in the development set,which was used for our studies. These videos contained between 80-200 frames each, lead-ing to a total of over 80000 data instances. More details of the MOBIO dataset can be foundin [40]. In case of each of the datasets, we randomly sampled 500 data instances from thedataset for training and 1000 data instances for testing, and averaged our results over 10independent trials to remove any randomness bias.

For both these datasets, automated face cropping was performed to crop out the faceregions [3] (In the VidTIMIT dataset, each of the videos was first sliced and stored as JPEGimages of resolution 512 by 384). To extract the facial features, block based discrete cosinetransform (DCT) was used (similar to [21]). Each image was subdivided into 8 by 8 non-overlapping blocks, and the DCT co-efficients of each block were then ordered accordingto the zigzag scan pattern. The DC co-efficient was discarded for illumination normaliza-tion, and the first 10 AC co-efficients of each block were selected to form compact localfeature vectors. Each local feature vector was normalized to unit norm. Concatenating thefeatures from the individual blocks yielded the global feature vector for the entire image.The cropped face image had a resolution of 128 by 128 and thus the dimensionality ofthe extracted feature vector was 2560. Principal Component Analysis (PCA), a commonlyaccepted step in face recognition techniques, was then applied to reduce the dimension to100, retaining about 99 % of the variance. Support Vector Machines (SVM) was used as theclassifier of choice for face data in both these datasets. The Lagrange multipliers obtainedwhile training a SVM are a straightforward choice to consider as non-conformity scores,as pointed out by Vovk et al. [60]. The Lagrange multipliers’ values are zero for examples

1http://www.mobioproject.org

http://www.mobioproject.org


outside the margin on the correct side, and lie between 0 and a positive constant, C, forother examples, thereby providing a natural monotonic measure of non-conformity w.r.t. thecorresponding class.

The speech data components of the VidTIMIT and MOBIO datasets were processed asdescribed by Nolazco-Flores et al. in [45]. The speech signal was downsampled to 8 KHzand a short-time 256-pt Fourier analysis was performed on a 25 ms Hamming window(10ms frame rate). Every log-energy frame was subsequently tagged as high, medium andlow (low and 80 % of the medium log-energy frames were discarded). The magnitude spec-trum was transformed to a vector of Mel-Frequency Cepstral Coefficients (MFCCs), and afeature warping algorithm was applied on the obtained features. A gender-dependent 512-mixture Gaussian Mixture Model (GMM) Universal Background Model was then initialisedusing the k-means clustering algorithm and then trained by estimating the GMM parametersvia the Expectation Maximization algorithm. Target-dependent models were then obtainedwith MAP (maximum a posteriori) speaker adaptation. Finally, the score computation fol-lowed a hypothesis test framework. The negative of the likelihood values generated by theGMM were used as the non-conformity scores, as suggested by Vovk et al. in [60]. Formore implementation details, please refer to [40] for video processing and [45] for speechprocessing.

4.1.2 Results: performance of base classifiers and calibration of errors in individualmodalities

Before studying the performance of our methodology in combining the p-values of the indi-vidual classifiers, the performance of the base classifiers on the data from the individualmodalities, as well as the calibration of errors when the CP framework is applied to the indi-vidual modalities, were observed. The SVM classifier provided accuracies of 94.5 % and94.1 % on face/video data from the VidTIMIT and MOBIO datasets respectively. However,the GMM classifier provided lower accuracies on the speech data: 44 % and 42 % on theVidTIMIT and MOBIO datasets respectively. (Note that we used the same training and testdata for all the experiments reported here for fairness of comparison.) The results obtained

Fig. 3 Results obtained on face/video data of the VidTIMIT dataset (SVM classifier)


Fig. 4 Results obtained on speech data of the VidTIMIT dataset (GMM classifier)

by applying the CP framework on the individual modalities for both the datasets are shownin Figs. 3, 4, 5 and 6. These figures show that the frequency of errors is bound by the spec-ified confidence level (maintaining the validity of the framework), but also indicate a lackof tight calibration at high confidence levels.

4.1.3 Results: calibration of errors under information fusion

Each of the six methods outlined in Section 3 was used to combine the p-values obtainedfrom the CP framework using the individual face and speech modalities. The combined p-values were subsequently used to obtain a new set of predictions, and the calibration of the

Fig. 5 Results obtained on face/video data of the MOBIO dataset (SVM classifier)


Fig. 6 Results obtained on speech data of the MOBIO dataset (GMM classifier)

fused conformal predictors was studied at different confidence levels. The results are pre-sented in Tables 2 and 3. It is evident that all the methods, except MIN, result in error ratesthat are bounded appropriately by the confidence level. However, it is important to note thatthe Standard Normal Function method provides the tightest calibration results (i.e. the errorrate is consistently close to 1 − ε, where ε is the confidence level) across both the datasets.We conclude from our empirical study in the classification setting that quantile combinationmethods hold relatively higher promise towards maintaining calibration in classifier-basedinformation fusion, acknowledging that it may be possible to find other quantile functionsor appropriate parameter values that may show similar (or better) performance, while thesame may not be easily said of order statistic methods. Besides, the vast body of existingwork in multiple hypothesis testing using quantile combination methods may be useful ininvestigating theoretical guarantees of conformal predictors in information fusion, whichwe will study in future work. It is also worthwhile mentioning that learning-based methodsthat are based on classifiers other than k-Nearest Neighbors may provide better calibrationperformance too, and will be investigated in our future work.

Table 2 Fusion results on the VidTIMIT dataset. The combination methods have been described inSection 3

Combination Method Percentage of Errors at Confidence Level

50 % 60 % 70 % 80 % 90 % 95 % 99 %

SNF 44.46 % 35.37 % 25.79 % 14.91 % 2.59 % 0.82 % 0.80 %

NCA 48.05 % 35.08 % 20.10 % 4.35 % 0.80 % 0.80 % 0.80 %

ECF 42.21 % 26.49 % 9.89 % 1.12 % 0.80 % 0.80 % 0.80 %

MIN 72.00 % 59.08 % 36.80 % 6.04 % 0.80 % 0.80 % 0.80 %

MAX 16.42 % 6.32 % 0.09 % 0.00 % 0.00 % 0.00 % 0.00 %

KNN 1.04 % 0.45 % 0.39 % 0.39 % 0.39 % 0.39 % 0.39 %

For k-NN, k = 5 provided the best results which are listed here


Table 3 Fusion results on the MOBIO dataset. The combination methods have been described in Section 3


50 % 60 % 70 % 80 % 90 % 95 % 99 %

SNF 46.05 % 37.73 % 28.92 % 20.49 % 7.92 % 2.18 % 0.91 %

NCA 46.02 % 33.68 % 17.24 % 4.04 % 0.93 % 0.91 % 0.91 %

ECF 44.05 % 33.91 % 21.30 % 7.54 % 1 % 0.91 % 0.91 %

MIN 72.89 % 58.17 % 43.82 % 25.81 % 3.14 % 0.91 % 0.91 %

MAX 13.17 % 4.32 % 0.72 % 0.00 % 0.00 % 0.00 % 0.00 %

KNN 0.68 % 0.44 % 0.44 % 0.44 % 0.44 % 0.44 % 0.44 %

We obtained the same results for different values of k in k-NN

4.2 Regression: multi-feature fusion for head pose estimation

We now present the results of applying our methodology to information fusion in regressionsettings, using head pose estimation as the real-world application. Head pose estimationhas been studied as an integral part of biometrics and surveillance systems for many years,with applications to 3D face modeling, gaze direction detection, and pose-invariant per-son identification from face images. The estimation of head pose angle from face images,independent of the identity of the individual, plays an important role in the ability of face-based biometric systems to handle significant head pose variations. Besides, head poseestimation has other applications ranging from driver monitoring to attention detection innext-generation teleconferencing systems. While coarse pose angle estimation from faceimages has been reasonably successful in recent years [11], accurate person-independenthead pose estimation from face images is a challenging problem, and continues to eliciteffective solutions. One of the approaches proposed in literature for robust head pose esti-mation includes the integration of multiple image features [6], which we use in this workto validate our methodology for extending conformal predictors to information fusion inregression settings.

4.2.1 Data setup

The FacePix database [37], illustrated in Fig. 7, has been used in this work for experimentsand evaluation. This database is publicly available,2 and has been used earlier by otherresearchers for head pose estimation [2, 44].

Our experiments in this work were performed with a set of 2184 face images, consist-ing of 24 individuals with pose angles varying from −90◦ to +90◦ in increments of 2◦.The images were subsampled to 32 × 32 resolution, and three different feature spaces ofthe images were considered for the experiments: (i) Grayscale pixel intensity values, (ii)Laplacian of Gaussian (LoG) transformed image feature space (as in [7]), and (iii) Gaborfilter transformed image feature space (as in [6]). The LoG transform captures the edge mapof the face images, while Gabor filters capture the texture information. The images weresubsequently rasterized and normalized. Our proposed method for conformal predictors ininformation fusion under regression settings described in Algorithm 4 was then applied,

2http://www.facepix.org/

http://www.facepix.org/


Fig. 7 Sample face images with varying pose from the FacePix database

with the three image features stated in the previous section (grayscale, LoG and Gabor) asthe data sources. The p-value combination methods described in Section 3 were used tocombine the p-values from the data sources. Since this is a regression problem, the k-NNclassifier-based combination method was not used in this setting.

4.2.2 Results: performance of base regressors and calibration of errors in individualmodalities

As for classification, the performance of the base regressors on the data from the individualdata sources, as well as the calibration of errors when the CP framework is applied to theindividual data sources, were observed. Using ridge regression as the base regressor, weobtained a mean absolute error of 5.75◦, 4.84◦ and 4.62◦ for the grayscale, LoG and Gaborfeature spaces respectively. These results showed high promise for ridge regression as aneffective method for head pose estimation, and justified the use of this method in our studies.The calibration of the CP framework using ridge regression was subsequently studied withrespect to each of the individual data sources (image features). The results obtained areshown in Table 4, and demonstrate very good calibration performance for each of the datasources.

4.2.3 Results: calibration of errors under information fusion

The methods outlined in Section 3 were again used to combine the p-values obtained fromthe CP framework using the individual data sources (with the exception of the KNN method,as mentioned earlier). The combined p-values were subsequently used to obtain a new setof predictions, and the calibration of the fused conformal predictors was studied at differentconfidence levels. The results are presented in Table 5. The α regularization parameter forridge regression was varied between 0 and 1, and the best results obtained are reproducedhere. Similar to our results for classification, the MIN method does not maintain calibrationin this setting. In addition, we observe that the Standard Normal Function method, whichperformed well in classification, and the ECF method do not show calibration (although the

Table 4 Calibration results of the individual features considered in the FacePix dataset using the CPframework with ridge regression

Image Feature Percentage of Errors at Confidence Level

50 % 60 % 70 % 80 % 90 % 95 % 99 %

Grayscale 51.16 % 40.84 % 30.04 % 20.22 % 9.88 % 4.96 % 1.02 %

LoG 50.56 % 40.48 % 30.88 % 20.34 % 10.14 % 4.80 % 1.12 %

Gabor Filters 50.30 % 40.72 % 29.82 % 19.66 % 10.08 % 4.60 % 1.14 %


Table 5 Fusion results on the FacePix dataset for the regression setting


50 % 60 % 70 % 80 % 90 % 95 % 99 %

SNF 52.10 % 48.34 % 43.90 % 39.56 % 32.66 % 28.26 % 19.82 %

NCA 52.20 % 40.26 % 28.16 % 17.02 % 8.46 % 3.58 % 0.58 %

ECF 43.20 % 36.84 % 30.16 % 23.28 % 15.94 % 11.34 % 5.44 %

MIN 61.54 % 52.32 % 41.56 % 28.72 % 14.80 % 8.22 % 1.92 %

MAX 39.86 % 29.12 % 19.02 % 12.20 % 5.54 % 2.04 % 0.34 %

The combination methods have been described in Section 3

ECF method performs better at lower confidence levels). A quantile combination method,the Non-Conformity Aggregation method, provides the best calibration results, followedby the MAX function method. We hypothesize that since the ridge regression conformalpredictors provided very good calibration performance with respect to each of the individualdata sources, the aggregation of the corresponding non-conformity scores leads to goodcalibration performance at the fusion level. In other words, it is possible that the NCAmethod will perform well at the fusion level only when the non-conformity scores at theindividual data source level lead to good calibration performance. This observation will beinvestigated further in our future work.

5 Conclusions and future work

In this work, we have proposed a new methodology for applying the Conformal Predictionsframework in information fusion settings, including both classification and regression. Thismethodology relates each data source to an independent hypothesis test, where the p-valuesobtained for the data sources using the framework are subsequently combined using estab-lished p-value combination methods, as well as learning-based methods. Our methodologywas studied in the context of two challenging real-world applications: (i) person recognitionusing multiple modalities, face/video and speech (classification setting), and (ii) head poseestimation using multiple image features from face images (regression setting). Our experi-mental results point to the inference that while order statistic methods (MAX, in particular)- and learning methods in the case of classification - maintain low error rates, quantile com-bination methods (such as the Standard Normal Function method and the Non-ConformityAggregation method) provide the most statistically valid calibration results. We concludethat quantile combination methods provide the highest promise in combining p-values toextend conformal predictors to information fusion contexts. Our inference also resonateswith an earlier study conducted by Loughin in [38]. Our studies also revealed that the MINmethod, a first-order statistic method, is not suitable for maintaining calibration in suchsettings. We note that the MIN method performs poorly when the prediction correspond-ing to the minimum p-value among the data sources is incorrect, and the method also doesnot have impact on the fusion results when the p-values for the individual data sourcesare very close to each other (both Loughin in [38], as well as Mudholkar and George in[43], observed that the minimum method is weak when evidence against the composite nullhypothesis is evenly distributed.). Also, in case of Non-Conformity Aggregation, our resultsfor regression settings showed that this method is likely to perform well at the fusion level


when the non-conformity scores at the individual data source level lead to good calibrationperformance.

An important issue in fusion of information from multiple data sources is the correlationof data originating from these sources. In this work, we assumed the hypothesis test cor-responding to each data source as mutually independent. However, it is expected that theresults will vary if there is significant correlation between the data sources. This will form animportant objective of our future work. Pesarin summarized methods for multiple hypoth-esis testing when the individual tests are dependent in [50], which we will explore for ourfurther studies. In addition, we assumed in this work that the p-values from the individualtests have equal importance. In contrast, weighted combination methods such as in [28, 42]can be used in application contexts where it is necessary to weight each data source differ-ently (for instance, based on the reliability of the data source). We will study such weightedmethods as part of our future work. We also plan to explore a larger set of quantile functions,learning-based fusion methods, and corresponding parameter values in our future efforts.

Acknowledgments We would like to thank the anonymous reviewers for their invaluable feedback in iden-tifying errors in the article, as well as in improving its presentation. We would also like to thank Dr JuanArturo Nolazco, Leibny Paola Garcia and Roberto Aceves at Tecnologico de Monterrey, Mexico, for theirkind support in processing the speech modality of the VidTIMIT and Mobio datasets and providing us withfeature vectors for analysis in this work.

This material is based upon work supported by the National Science Foundation under Grant No 1116360.The authors would like to thank the National Science Foundation for their support. Any opinions, findings,and conclusions or recommendations expressed in this material, however, are those of the authors and do notnecessarily reflect the views of the National Science Foundation.

References

1. Ali, H., Antenreiter, M., Auer, P., Csurka, G., de Campos, T., Hussain, Z., Laaksonen, J., Ortner, R.,Pasupa, K., Perronnin, F., Saunders, C., Shawe-Taylor, J., Viitaniemi, V.: Description, analysis and eval-uation of confidence estimation procedures for sub-categorization. Tech. Rep. D6.2.1, Xerox ResearchCenter Europe (2009)

2. Bailly, K., Milgram, M.: 2009 special issue: boosting feature selection for neural network basedregression. Neural Netw. 22(5–6), 748–756 (2009)

3. Balasubramanian, V., Chakraborty, S., Panchanathan, S.: Generalized query by transduction for onlineactive learning. In: Proceedings of the International Conference on Computer Vision (ICCV 2009)Workshop on Online Learning for Computer Vision (2009)

4. Balasubramanian, V., Gouripeddi, R., Panchanathan, S., Vermillion, J., Bhaskaran, A., Siegel, R.: Sup-port vector machine based conformal predictors for risk of complications following a coronary drugeluting stent procedure. In: Computers in Cardiology, pp. 5–8 (2009)

5. Balasubramanian, V., Panchanathan, S., Chakraborty, S.: Multiple cue integration in transductive confi-dence machines for head pose classification. In: Computer Vision and Pattern Recognition Workshops,CVPRW ’08. IEEE Computer Society Conference, pp. 1–8 (2008)

6. Balasubramanian, V., Panchanathan, S., Chakraborty, S.: Multiple cue integration in transductive confi-dence machines for head pose classification. In: Computer Vision and Pattern Recognition Workshops,CVPRW ’08. IEEE Computer Society Conference, pp. 1–8 (2008). doi:10.1109/CVPRW.2008.4563070

7. Balasubramanian, V., Ye, J., Panchanathan, S.: Biased manifold embedding: a framework for person-independent head pose estimation. In: Computer Vision and Pattern Recognition, CVPR ’07. IEEEConference, pp. 1–7 (2007). doi:10.1109/CVPR.2007.383280

8. Beiraghi, S., Ahmadi, M., Ahmed, M.S., Shridhar, M.: Application of fuzzy integrals in fusion of clas-sifiers for low error rate handwritten numerals recognition. Int. Conf. Pattern Recog. (ICPR) 2, 2487(2000)

9. Ben-Yacoub, S., Abdeljaoued, Y., Mayoraz, E.: Fusion of face and speech data for person identityverification. IEEE Trans. Neural Netw. 10(5), 1065–1074 (1999). doi:10.1109/72.788647

http://dx.doi.org/10.1109/CVPRW.2008.4563070

http://dx.doi.org/10.1109/CVPR.2007.383280

http://dx.doi.org/10.1109/72.788647


10. Bolme, D., Beveridge, J., Howe, A.: Person identification using text and image data. In: Biometrics:Theory, Applications, and Systems, BTAS 2007. First IEEE International Conference, pp. 1–6 (2007)

11. Brown, L.M., Tian, Y.L.: Comparative study of coarse head pose estimation. In: IEEE Workshop onMotion and Video Computing, Orlando, Florida, pp. 125–130 (2002)

12. Brunelli, R., Falavigna, D.: Person identification using multiple cues. IEEE Trans. Pattern. Anal. Mach.Intell. 17, 955–966 (1995)

13. Buchanan, B.G., Shortliffe, E.H.: Rule Based Expert Systems: The Mycin Experiments of the StanfordHeuristic Programming Project (The Addison-Wesley series in artificial intelligence). Addison-WesleyLongman Publishing Co., Inc. (1984)

14. Carrasco, M., Pizarro, L., Mery, D.: Bimodal biometric person identification system under perturbations.In: Advances in Image and Video Technology, pp. 114–127 (2007)

15. Dasarathy, B.V.: Decision Fusion. IEEE Computer Society Press, Los Alamitos CA (1994)16. Dashevskiy, M., Luo, Z.: Network traffic demand prediction with confidence. In: IEEE Global

Telecommunications Conference IEEE GLOBECOM 2008, pp. 1–5 (2008). http://dx.doi.org/10.1109/GLOCOM.2008.ECP.284

17. Dempster, A.: A generalization of bayesian inference. In: Classic Works of the Dempster-Shafer Theoryof Belief Functions, pp. 73–104 (2008)

18. Dubois, D., Prade, H.: Possibility theory and its applications: a retrospective and prospective view. In:Fuzzy Systems, FUZZ ’03. The 12th IEEE International Conference, vol. 1, pp. 5–11 (2003)

19. Duc, B., Bign, E.S., Bign, J., Matre, G., Fischer, S.: Fusion of audio and video information for multimodal person authentication. Pattern Recognit. Lett. 18, 835–843 (1997)

20. Edgington, E.S.: An Additive Method for Combining Probability Values from Independent Experiments.J. Psychol. 80(2), 351–363 (1972). doi:10.1080/00223980.1972.9924813

21. Ekenel, H.K., Fischer, M., Jin, Q., Stiefelhagen, R.: Multi-modal person identification in a smart envi-ronment. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1–8(2007)

22. Ekenel, H.K., Jin, Q., Fischer, M., Stiefelhagen, R.: ISL person identification systems in the CLEAR2007 evaluations. In: Stiefelhagen, R., Bowers, R., Fiscus, J. (Eds.) Multimodal Technologies for Per-ception of Humans, vol. 4625, pp. 256–265. Springer, Berlin/Heidelberg (2008). doi:10.1007/978-3-540-68585-2 24

23. Eugene, T.H., Weinstein, E., Kabir, R., Park, A.: Multi-modal face and speaker identification on ahandheld device. In: Proceedings Workshop Multimodal User Authentication, pp. 120–132 (2003)

24. Fisher, S.R.A.: Statistical methods for research workers, vol. 14, pp. 140–142. Edinburgh, Oliver andBoyd (1970)

25. Smarandache, F., Dezert, J. (eds.): Advances and Applications of DSmT for Information Fusion(Collected works), 2nd volume, Am. Res Press (2006)

26. Fox, N., Gross, R., Cohn, J., Reilly, R.: Robust biometric person identification using automatic classifierfusion of speech, mouth, and face experts. IEEE Trans. Multimed. 9(4), 701–714 (2007)

27. Shafer, G.: Perspectives on the theory and practice of belief functions. Int. J. Approx. Reason. 4(5–6),323–362 (1990)

28. Good, I.J.: On the weighted combination of significance tests. J. R. Stat. Soc. Ser. B Methodol. 17(2),264–265 (1955)

29. Hu, R., Damper, R.: Fusion of two classifiers for speaker identification: removing and not removingsilence. In: Information Fusion, 2005 8th International Conference, vol. 1, p. 8 (2005)

30. Jain, A., Nandakumar, K., Ross, A.: Score normalization in multimodal biometric systems. Pattern Recog38(12), 2285, 2270 (2005)

31. Jost, L.: Combining significance levels from multiple experiments or analyses. http://www.loujost.com/Statistics and Physics/StatsArticlesIndex.htm (2009). Accessed 18 June 2009

32. Kostas, H.P., Proedrou, K., Vovk, V., Gammerman, A., Ex, S.T.: Inductive confidence machines forregression. In: Elomaa T., Mannila, H., Toivonen, H. (eds.) Proceedings of the 13th European Conferenceon Machine Learning, vol. 2430, pp. 345–356 (2002)

33. Lambrou, A., Papadopoulos, H., Gammerman, A.: Reliable confidence measures for medical diag-nosis with evolutionary algorithms. IEEE transactions on information technology in biomedicine:a publication of the IEEE Engineering. Med Biol Soc 15(1), 93–99 (2011). (PMID: 21062682)doi:10.1109/TITB.2010.2091144

34. Lancaster, H.O.: The combination of probabilities: an application of orthonormal functions. Aust. N. Z.J. Stat. 3(1), 20–33 (1961)

35. Li, M., Vitanyi, P.: An introduction to Kolmogorov complexity and its applications. 2nd edn. Springer,Secaucus (1997)

http://dx.doi.org/10.1109/GLOCOM.2008.ECP.284

http://dx.doi.org/10.1109/GLOCOM.2008.ECP.284

http://dx.doi.org/10.1080/00223980.1972.9924813

http://dx.doi.org/10.1007/978-3-540-68585-2_24

http://dx.doi.org/10.1007/978-3-540-68585-2_24

http://www.loujost.com/StatisticsandPhysics/StatsArticlesIndex.htm

http://www.loujost.com/StatisticsandPhysics/StatsArticlesIndex.htm

http://dx.doi.org/10.1109/TITB.2010.2091144


36. Liptak, T.: On the combination of independent tests. Magyar Tud Akad Mat Kutato Int Kozl 3, 171–197(1958)

37. Little, G., Krishna, S., Black, J., Panchanathan, S.: A methodology for evaluating robustness of facerecognition algorithms with respect to variations in pose and illumination angle. In: Proceedings of IEEEInternational Conference on Acoustics, Speech and Signal Processing, pp. 89–92. Philadelphia. (2005)

38. Loughin, T.M.: A systematic comparison of methods for combining p-values from independent tests.Comput. Stat. Data Anal. 47(3), 467–485 (2004)

39. Luque, J., Morros, R., Garde, A., Anguita, J., Farrus, M., Macho, D., Marqus, F., Martnez, C., Vilaplana,V., Hernando, J.: Audio, video and multimodal person identification in a smart room. In: MultimodalTechnologies for Perception of Humans, pp. 258–269 (2007)

40. Marcel, S., McCool, C., Chakraborty, S., Balasubramanian, V., Panchanathan, S., Nolazco, J., Garcia,L., Aceves, R., et al.: Mobile biometry (mobio) face and speaker verification evaluation. In: Proceedingsof the 20th International Conference on Pattern Recognition (ICPR2010) (2010)

41. Michaelsen, E., Jaeger, K.: Evidence fusion using the GESTALT-system. In: Information Fusion, 200811th International Conference on, pp. 1–7. IEEE (2008)

42. Mosteller, F., Bush, R.R., Green, B.F.: Selected quantitative techniques. Addison-Wesley (1970)43. Mudholkar, G.S., George, E.O.: The logit method for combining probabilities. In: Symposium on

Optimizing Methods in Statistics, pp. 345–366. Academic Press, New York (1979)44. Murphy-Chutorian, E., Trivedi, M.: Head pose estimation in computer vision: a survey. IEEE Trans.

Pattern Anal. Mach. Intell. 31(4), 607–626 (2009)45. Nolazco-Flores, J., Garcia-Perera, P.: Enhancing acoustic models for robust speaker verification. In: Pro-

ceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2008)

46. Nouretdinov, I., Melluish, T., Vovk, V.: Ridge regression confidence machine. In: Proceedings of the18th International Conference on Machine Learning, pp. 385—392 (2001)

47. Palanivel, S., Yegnanarayana, B.: Multimodal person authentication using speech, face and visual speech.Comput. Vis. Image Underst. 109(1), 44–55 (2008)

48. Papadopoulos, H.: Inductive conformal prediction: theory and application to neural networks. In: Toolsin Artificial Intelligence, pp. 315–329 (2008)

49. Papadopoulos, H., Vovk, V., Gammerman, A.: Regression conformal prediction with nearest neighbours.J. Artif. Int. Res 40(1), 815-840 (2011). http://dl.acm.org/citation.cfm?id=20169452016967

50. Pesarin, F.: Multivariate permutation tests: with applications in biostatistics, vol. 240. Wiley, Chichester(2001)

51. Poh, N., Korczak, J.: Hybrid biometric person authentication using face and voice features. In:Proceedings AVBPA, pp. 348–353 (2001)

52. Proedrou, K.: Rigorous measures of confidence for pattern recognition and regression. PhD thesis, RoyalHolloway College, University of London, Advisor-Alex Gammerman (2003)

53. Rogova, G.L., Nimier, V.: Reliability in information fusion: literature survey. In: Svensson, P.,Schubert, J. (eds.) Proceedings of the 7th International Conference on Information Fusion. InternationalSociety of Information Fusion, vol. II, pp. 1158–1165. Mountain View, CA (2004)

54. Sanderson, C.: Biometric person recognition: Face, speech and fusion. VDM Publishing (2008)55. Shafer, G.: A mathematical theory of evidence. Princeton University Press, Princeton (1976)56. Shafer, G., Vovk, V.: A tutorial on conformal prediction. J. Mach. Learn. Res 9, 371–421 (2008)57. Shaffer, J.P.: Multiple hypothesis testing. Annual Review of Psychology 46(1), 561–584 (1995).

doi:10.1146/annurev.ps.46.020195.00302158. Tippett, L.H.C.: The methods of statistics. 4th edn. Dover, New York (1963)59. Vovk, V.: On-line confidence machines are well-calibrated. In: FOCS ’02: 43rd Symposium on

Foundations of Computer Science. IEEE Computer Society, pp. 187–196, Washington, DC. (2002)60. Vovk, V., Gammerman, A., Shafer, G.: Algorithmic Learning in a Random World. Springer, Secaucus

(2005)61. Wilkinson, B.: A statistical consideration in psychological research. Psychol. Bull. 48(3), 156–8 (1951)62. Yang, F., Wang, H.-Z., Mi, H., de Lin, C., wen Cai, W.: Using random forest for reliable classification

and cost-sensitive learning for medical diagnosis. BMC Bioinforma. 10(1), S22 (2009)

http://dl.acm.org/citation.cfm?id=20169452016967

http://dx.doi.org/10.1146/annurev.ps.46.020195.003021

Documents

Conformal predictions for information fusion