8
Dynamic Batch Size Selection for Batch Mode Active Learning in Biometrics Shayok Chakraborty, Vineeth Balasubramanian and Sethuraman Panchanathan Center for Cognitive Ubiquitous Computing Arizona State University Tempe, Arizona, USA (schakr10, vineeth.nb, panch)@asu.edu Abstract—Robust biometric recognition is of paramount importance in security and surveillance applications. In face based biometric systems, data is usually collected using a video camera with high frame rate and thus the captured data has high redundancy. Selecting the appropriate instances from this data to update a classification model, is a signif- icant, yet valuable challenge. Active learning methods have gained popularity in identifying the salient and exemplar data instances from superfluous sets. Batch mode active learning schemes attempt to select a batch of samples simultaneously rather than updating the model after selecting every single data point. Existing work on batch mode active learning assume a fixed batch size, which is not a practical assumption in biometric recognition applications. In this paper, we propose a novel framework to dynamically select the batch size using clustering based unsupervised learning techniques. We also present a batch mode active learning strategy specially suited to handle the high redundancy in biometric datasets. The results obtained on the challenging VidTIMIT and MOBIO datasets corroborate the superiority of dynamic batch size selection over static batch size and also certify the potential of the proposed active learning scheme in being used for real world biometric recognition applications. Keywords-active learning; DBSCAN clustering; numerical optimization I. I NTRODUCTION The rapid proliferation of technology and the advent of modern technological equipments have resulted in the generation of large quantities of digital data. This has ex- panded the possibilities of solving real world problems using computational learning frameworks. However, annotating large quantities of data (with class labels) is an expensive process in terms of time and human labor. Active learning algorithms seek to alleviate this problem by selecting the salient data instances required to construct a robust classifier. This tremendously reduces the human labeling effort and also exposes the classifier to the most informative examples from the underlying data population. Face based biometrics forms an integral part of auto- mated human recognition and verification systems. Such systems inherently rely on video streams to infer identities of subjects. Modern video cameras have a high frame rate and consequently the images forming a video stream have a high redundancy among them. Selecting the salient and representative samples from this superfluous set (to induce a classifier) is a challenging problem. An active learning algorithm which can select the promising instances from such data will be of immense use in facilitating the learning process in such a scenario. Moreover, due to the implicit redundancy and the vast quantities of data collected in biometric applications, there is a need to simultaneously select and learn from batches of data samples in a video stream, instead of updating the classifier after every single data point is selected. Such a batch mode active learning framework can have applications in security and surveillance as well as in robotics and wearable vision systems, where there is high redundancy in data. Present work on batch mode active learning (BMAL) sample batches of points from an unlabeled set such that some criteria is satisfied (eg. a heuristic score is optimized). However, all such techniques assume that the batch size (number of data points to be selected from a pool of unla- beled points to update a classification model) is specified by the end user ([1], [2], [3]). In an application like face-based biometric recognition, this is not a practical assumption. To illustrate this, consider two video streams, one containing the images of just a single subject and the other containing images of 10 different subjects. Intuitively, the second video stream contains a larger variety of images and therefore the batch size should be greater for the second stream compared to the first. In such situations, it is impractical to decide on a batch size beforehand and use the same value to analyze all video streams. Instead, the number of points to be selected should depend on the quality of the video stream being analyzed and should be greater if the video contains multifarious images. In other words, there is a need to dynamically determine the batch size in active learning algorithms for face-based biometrics. Further, once the batch size is determined, images that are most appropriate for a given application at hand have to be selected. In order to address the aforementioned issues, we propose two major contributions in this paper: (i) we introduce a novel methodology to automatically select the batch size, i.e. the number of data samples that need to be used for learning a classifier in a given video stream for batch mode active learning algorithms; (ii) we also present an optimization based BMAL strategy that is specifically tailored to identify the most distinctive samples in a given video stream, once 2010 Ninth International Conference on Machine Learning and Applications 978-0-7695-4300-0/10 $26.00 © 2010 IEEE DOI 10.1109/ICMLA.2010.10 15

[IEEE 2010 International Conference on Machine Learning and Applications (ICMLA) - Washington, DC, USA (2010.12.12-2010.12.14)] 2010 Ninth International Conference on Machine Learning

Embed Size (px)

Citation preview

Dynamic Batch Size Selection for Batch Mode Active Learning in Biometrics

Shayok Chakraborty, Vineeth Balasubramanian and Sethuraman PanchanathanCenter for Cognitive Ubiquitous Computing

Arizona State UniversityTempe, Arizona, USA

(schakr10, vineeth.nb, panch)@asu.edu

Abstract—Robust biometric recognition is of paramountimportance in security and surveillance applications. In facebased biometric systems, data is usually collected using avideo camera with high frame rate and thus the captureddata has high redundancy. Selecting the appropriate instancesfrom this data to update a classification model, is a signif-icant, yet valuable challenge. Active learning methods havegained popularity in identifying the salient and exemplar datainstances from superfluous sets. Batch mode active learningschemes attempt to select a batch of samples simultaneouslyrather than updating the model after selecting every single datapoint. Existing work on batch mode active learning assumea fixed batch size, which is not a practical assumption inbiometric recognition applications. In this paper, we propose anovel framework to dynamically select the batch size usingclustering based unsupervised learning techniques. We alsopresent a batch mode active learning strategy specially suited tohandle the high redundancy in biometric datasets. The resultsobtained on the challenging VidTIMIT and MOBIO datasetscorroborate the superiority of dynamic batch size selection overstatic batch size and also certify the potential of the proposedactive learning scheme in being used for real world biometricrecognition applications.

Keywords-active learning; DBSCAN clustering; numericaloptimization

I. INTRODUCTION

The rapid proliferation of technology and the adventof modern technological equipments have resulted in thegeneration of large quantities of digital data. This has ex-panded the possibilities of solving real world problems usingcomputational learning frameworks. However, annotatinglarge quantities of data (with class labels) is an expensiveprocess in terms of time and human labor. Active learningalgorithms seek to alleviate this problem by selecting thesalient data instances required to construct a robust classifier.This tremendously reduces the human labeling effort andalso exposes the classifier to the most informative examplesfrom the underlying data population.

Face based biometrics forms an integral part of auto-mated human recognition and verification systems. Suchsystems inherently rely on video streams to infer identitiesof subjects. Modern video cameras have a high frame rateand consequently the images forming a video stream havea high redundancy among them. Selecting the salient andrepresentative samples from this superfluous set (to induce

a classifier) is a challenging problem. An active learningalgorithm which can select the promising instances fromsuch data will be of immense use in facilitating the learningprocess in such a scenario. Moreover, due to the implicitredundancy and the vast quantities of data collected inbiometric applications, there is a need to simultaneouslyselect and learn from batches of data samples in a videostream, instead of updating the classifier after every singledata point is selected. Such a batch mode active learningframework can have applications in security and surveillanceas well as in robotics and wearable vision systems, wherethere is high redundancy in data.

Present work on batch mode active learning (BMAL)sample batches of points from an unlabeled set such thatsome criteria is satisfied (eg. a heuristic score is optimized).However, all such techniques assume that the batch size(number of data points to be selected from a pool of unla-beled points to update a classification model) is specified bythe end user ([1], [2], [3]). In an application like face-basedbiometric recognition, this is not a practical assumption. Toillustrate this, consider two video streams, one containingthe images of just a single subject and the other containingimages of 10 different subjects. Intuitively, the second videostream contains a larger variety of images and therefore thebatch size should be greater for the second stream comparedto the first. In such situations, it is impractical to decideon a batch size beforehand and use the same value toanalyze all video streams. Instead, the number of pointsto be selected should depend on the quality of the videostream being analyzed and should be greater if the videocontains multifarious images. In other words, there is a needto dynamically determine the batch size in active learningalgorithms for face-based biometrics. Further, once the batchsize is determined, images that are most appropriate for agiven application at hand have to be selected.

In order to address the aforementioned issues, we proposetwo major contributions in this paper: (i) we introduce anovel methodology to automatically select the batch size, i.e.the number of data samples that need to be used for learninga classifier in a given video stream for batch mode activelearning algorithms; (ii) we also present an optimizationbased BMAL strategy that is specifically tailored to identifythe most distinctive samples in a given video stream, once

2010 Ninth International Conference on Machine Learning and Applications

978-0-7695-4300-0/10 $26.00 © 2010 IEEE

DOI 10.1109/ICMLA.2010.10

15

the batch size has been determined. Although validated onbiometric data in this work, the proposed framework isgeneric and can be used in many other applications whereit may be necessary to select a number of representativeentities from repetitious samples.

The rest of the paper is organized as follows: in Section 2,we present a survey of existing active learning algorithms,Section 3 details the mathematical formulation of the frame-work, the results of our experiments are presented in Section4 and we conclude with discussions in Section 5.

II. RELATED WORK

We begin with a survey of active learning techniques thathave been used in biometric applications. We then present anoverview of active learning techniques in general, followedby existing work in batch mode active learning.

A. Active Learning in Biometrics

Active learning has been applied to the field of face-based biometrics in earlier work, although from a differentperspective. The work by Hewitt and Belongie [4] activelyselected face images for manual annotation, but was focusedon tracking rather than recognition. Balasubramanian et al.[5] applied a transductive active learning approach to facerecognition in the online setting. Very recently, Kapoor etal. [6] incorporated match and non-match constraints inactive learning for face recognition. However, all the aboveapplications of active learning in biometrics focused onselecting a single data instance at a time from a video stream;no approach has been developed to select a batch of samplesat one shot.

B. Active learning : A General Survey

Active learning has been extensively applied in manydomains like text classification, image segmentation andimage retrieval. It can be broadly categorized as shown inFigure 1. At the highest level, we can divide such methodsinto two kinds: pool based and online. Pool based activelearning is further divided into Serial Query based ActiveLearning and Batch Mode Active Learning (BMAL). In aserial query based active learning system, the classifier isupdated after every single query. This is time consumingas the model needs to be retrained frequently. Also, ifa simultaneous labeling system is available (for example,all the frames in the video of a given subject can beannotated with a single label query), the serial query basedapproach results in poor utilization of available resources.Batch mode active learning schemes address this issue byselecting multiple instances at a time from the unlabeledpool for annotation. Also, if the data is completely labeled,such methods select the optimal subset of points that arerequired to update a given classifier. We briefly review eachof the categories and sub-categories.

Figure 1. Categories of active learning.

In online active learning, the learner encounters the datapoints sequentially, and at each instant, the model has todecide whether to query the current point and update thehypothesis [7] [8]. The fundamental challenge in onlineactive learning is to design a query function for eachindividual point as it arrives, without having access to theentire set of unlabeled data.

In a serial query pool based system, the learner is exposedto a pool of unlabeled instances and it iteratively selectsa single example for manual annotation and updates thehypothesis with the returned label. Majority of the activelearning algorithms have been applied in this setting andcan be divided into 4 categories as shown in Figure 1 -(i) SVM based approaches [9], (ii) Statistical approaches[10], (iii) Ensemble based approaches [11][12] and (iv)other miscellaneous approaches [13][14]. However, all thesemethods are designed to select only a single instance in eachiteration of the algorithm. A review of existing batch modeactive learning algorithm follows.

C. Batch Mode Active Learning

While serial query based active learning has been widelyused, batch mode active learning has been comparativelyless explored. Brinker [3] proposed a BMAL scheme wherea batch of points was incrementally sampled by ensuring ateach step that the hyperplane induced by the selected pointmaximizes the angle with all the hyperplanes of the alreadyselected points. Hoi et al. [15] used the Fisher informationmatrix as a measure of model uncertainty and proposedto select a batch of points which maximally reduced theFisher information in the classification model. The sameauthors also applied the batch mode active learning conceptto the problem of content based image retrieval [16] [1]and medical image classification [2]. Guo and Schuurmans[17] formalized the problem by proposing an optimization-based solution to select the most appropriate batch of unla-beled points for active learning. This approach has a well-defined mathematical basis, compared to the other heuristicapproaches. Hence, our methodology builds on a similar op-

16

timization formulation, which is however tailored to providebetter performance in the given biometrics application.

All the existing approaches of batch mode active learning(including [17]) operate under the assumption of a prede-termined batch size. As mentioned earlier, in a biometricrecognition application, it is difficult to decide the batchsize in advance. In this work, we propose a clusteringbased strategy to dynamically compute the batch size fora given video stream. We then formulate an optimizationbased BMAL scheme which is specifically suited to identifydistinctive data samples in biometric data. We now describethe algorithmic details of our approach.

III. ALGORITHM DETAILS

We split this section into two parts - in the first, wedescribe a strategy to dynamically decide the batch size for agiven unlabeled video stream. In the second part, we presentan optimization based framework to select a batch of imagesonce the batch size has been determined.

A. Dynamically Selecting the Batch Size

In order to obtain a reliable classification model, it isimperative to expose the classifier to all possible saliententities in a video stream. For example, if there are multiplesubjects in a video stream, it would be necessary that thesalient images of all the subjects are selected to update theclassifier. Thus, given an unlabeled video, the batch selectionalgorithm should be able to isolate the data points thatbelong to each of the subjects present in the video stream.This motivates the application of a clustering algorithm tosegregate the unlabeled stream into relatively pure clusters(in terms of class labels).

Further, we need a clustering strategy that does notrequire the number of clusters as an input parameter, as thesystem is not expected to know this in advance. Clusteringalgorithms like k-means will therefore not be suitable for thiswork. We therefore require an algorithm which can performclustering based on the density of the given points. TheDBSCAN algorithm is based on the notion of point densityand isolates high density regions as separate clusters. Itautomatically determines the number of clusters for a givenset of points and is hence most suitable for our work. Thealgorithm labels each point as a core point, border pointor a noise point depending on the number of points in apredetermined neighborhood. Each group of connected corepoints is designated as a separate cluster and each borderpoint is assigned to the closest core point cluster. It requiresthe number of neighborhood points and the neighborhoodradius as input parameters. The neighborhood radius can becomputed from the number of neighborhood points using thesorted distance graph. For details about this method, pleaserefer to [18].

To decide the batch size from the cluster structure of thedata, we need a strategy to guide the number of points to

be selected from each cluster. Evidently, this number shoulddepend on parameters that are associated with the clustersobtained by the DBSCAN algorithm , such as cohesion andseparation. The Silhouette coefficient is a commonly usedunsupervised cluster evaluation metric which combines thecohesion and separation measures. For the ith point in acluster, the Silhouette coefficient is defined as [18]:

si =(bi − ai)max(ai, bi)

(1)

where ai is the average distance of the ith point to all objectsin its cluster and bi is the minimum of the average distancesof the ith point to each of the other clusters. The Silhouettecoefficient for an entire cluster can then be computed as theaverage of the coefficients of each point forming the cluster.The coefficient can attain a maximum value of 1, where ahigh value denotes a compact and well separated cluster.

We would like to select fewer points if a cluster is compactand well separated and more points otherwise. Hence, thenumber of points to be selected from a cluster should beproportional to (1 - the Silhouette coefficient). Also, thenumber of points to be selected should be proportional to thepercentage of the number of points forming the given clusteragainst the total number of points, as we would like to selectmore points from larger clusters. If m is the total number ofpoints, mi is the number of points in cluster i, SCi is theSilhouette coefficient of cluster i and C is a constant, thenumber of points to be selected from cluster i can thus bedefined as:

Ni = C ∗ mi

m∗ (1− SCi) (2)

This operation is performed for each of the identifiedclusters to compute the corresponding number of points to beselected. The sum of the values obtained across all clustersprovides the overall batch size.

B. Batch Mode Active Learning for Biometrics

Having identified the batch size dynamically, it is equallyimportant to select the most distinctive data samples froma given cluster that are to be used for updating the classifi-cation model. We now formulate an objective function thatmaximizes distinctiveness from the training set and selectsan appropriate batch of points which optimizes the value ofthat objective function.

To append maximal information to the already availabletraining set, the points selected from a cluster should beat a high distance from the existing training set. Thiswill ensure that images which are very different from theavailable training data get selected from a cluster. Froma data geometry point of view, it is possible that a termwhich selects images that are at a maximal distance fromthe existing training data, will select points only from asmall pocket in a cluster. In order to ensure that the updated

17

classifier performs well on all images in a cluster, theimages selected from the cluster should be representativesof the images that are not being selected. This conditioncan be satisfied by incorporating a term in the objectivefunction which asserts that the uncertainty of the classifierin classifying the remaining unselected images is minimized.Entropy was taken as a measure of uncertainty in our work(similar to [19]). The two conditions together ensure thatdistinctive samples are picked and the selected images haveminimal redundancy among them.

Formally, consider a BMAL problem, which has a trainingset Lt and a classifier wt trained on Lt. The classifier isexposed to an unlabeled video Ut at time t. The objectiveis to select a batch B containing m points which satisfiesthe above two conditions. We define a performance scorefunction f(B) as follows:

f(B) =∑i∈B

dist(xi, Lt)− λ∑

j∈Ut−B

S(y|xj , wt+1) (3)

where dist(xi, Lt) denotes the Euclidean distance of theunlabeled point xi from the current training set Lt (thedistance of a point from a set of points is defined as theaverage distance of the point from all points in the set)and S(y|xj , w

t+1) denotes the entropy of the updated modelwt+1 in classifying the unlabeled point xj . The first termdenotes the sum of the distances of each selected point fromthe current training set while the second term quantifies thesum of the uncertainties of the remaining unselected pointsin the unlabeled video. λ is a tradeoff parameter whichcontrols the relative importance of labeled and unlabeleddata. The problem thereby reduces to selecting a batch B ofm unlabeled points which has maximum score f(B). Thisis a standard non-convex optimization problem and can besolved using gradient descent techniques. The Quasi Newtonmethod [20] was used to solve the problem in this work.

IV. EXPERIMENTS AND RESULTS

A. Datasets

We used the VidTIMIT [21] and the MOBIO [22] facedatasets in this work. The VidTIMIT dataset contains videorecordings of subjects reciting short sentences under naturalconditions. The MOBIO (Mobile Biometry) dataset wascreated for the MOBIO challenge to test the performancesof state-of-the-art face and speech recognition algorithms.It contains videos of subjects under challenging real worldconditions. Both these datasets contain redundant informa-tion and are hence suitable to test active learning algorithms.25 subjects were randomly chosen from each dataset for ourexperiments. Our aim was to test the performance of activelearning methods on face based biometrics and hence, wedid not follow the specific protocols of the actual MOBIOface recognition challenge in this work.

B. Feature Extraction

For the clustering technique to work effectively, we needto extract appropriate feature vectors from the face imageswhich ensure that images of a particular subject have lesserdistance among them as compared to the images of differentsubjects. Our experiments showed that the Discrete CosineTransform (DCT) feature captured the subject variabilitydesirably. Each frame in the video sequence contained asingle face image which was automatically detected andcropped to 128 by 128. Block by block DCT was appliedto each image to extract the feature vectors (please referto Ekenel et al. [23] for more details). DBSCAN is basedon distance computations and does not work well in veryhigh dimensions. Therefore, PCA was applied to reduce thedimension from 2560 to 100 retaining about 99% of thevariance.

Figure 2. Three different subjects from the VidTIMIT dataset

To demonstrate the efficacy of DCT in capturing subject-wise variablity, we randomly selected videos and carried outthe following experiment. Several videos of each subjectwere selected and DCT features were extracted from eachframe followed by PCA as described above. We computedthe average distance between images of a particular subjectas well as the average distance between images of differentsubjects. As a sample, we present results of 3 subjects(shown in Figure 2) from the VidTIMIT dataset. Table Isummarizes our findings:

SubjectNumbers

Subject 1 Subject 2 Subject 3

Subject 1 11.14 17.51 15.89Subject 2 17.51 14.93 16.92Subject 3 15.89 16.92 11.38

Table ITABLE SHOWING THE INTER-CLASS AND INTRA-CLASS AVERAGE

DISTANCE OF 3 SUBJECTS FROM THE VIDTIMIT DATASET.

From the table, it is evident that images of a given subjecthave a much smaller distance between them as compared toimages of different subjects. Thus, the DCT feature aptlycaptures the subject variations and is therefore well-suitedfor our work. Our later experiments confirmed that the SIFTfeature also had this property and could have been used inour work.

18

(a) DBSCAN on the VidTIMIT and MOBIO datasets (b) Purity of the VidTIMIT and MOBIO clusters

Figure 3. Performance of the DBSCAN clustering algorithm on the VidTIMIT and MOBIO datasets.

C. Classification Methodology

The entropy term in the objective function necessitates aclassifier which can provide concrete probability estimatesof a sample with respect to each of the class labels. With thisconstraint, we chose Gaussian Mixture Models (GMMs) asour base classification model. GMMs have been successfullyused in face recognition [24]. The parameters of eachGaussian were trained using the Expectation Maximization(EM) algorithm [25].

D. Experiments

Experiment 1: The purpose of this experiment was todemonstrate the efficacy of the DBSCAN algorithm toidentify the number of subjects in a given video stream.To depict this, DBSCAN, with Euclidean distance as thedistance measure, was applied on video streams where thenumber of subjects in each stream varied between 1 and10. The number of frames in each video stream was keptthe same (approximately 100) regardless of the number ofsubjects in it. Hence, it was impossible to get an estimateof the number of subjects from the size of the video stream.The number of neighborhood points was empirically selectedas 5 and the neighborhood radius was computed from thesorted distance graph as described in [18].

The results on the VidTIMIT and the MOBIO datasetsare shown in Figure 3(a). The x axis denotes the number ofsubjects actually present in the video stream and the y axisdenotes the number of clusters found by DBSCAN. Also,each bar in the graph depicts the average result over 10 trialswith different combinations of the corresponding number ofsubjects (selected randomly from the set of 25 subjects ineach dataset) to remove any subjectwise bias. It is notedthat for both the datasets, DBSCAN succeeds in accuratelyidentifying the number of subjects in the video stream.

In addition to the number of clusters found, it is alsoimportant to analyze the “goodness” of each cluster. Purityis a supervised measure of cluster validity which computes

the extent to which a cluster contains objects of a singleclass. The purity of cluster i is defined as [18]:

pi = maxjpij

where pij is the probability that a member of cluster ibelongs to class j and is defined as:

pij =mij

mi

where mi is the number of objects in cluster i and mij isthe number of objects of class j in cluster i. The overallpurity of a clustering is given by:

purity =K∑

i=1

mi

mpi (4)

where K is the total number of clusters, m is the totalnumber of points and mi is the number of points in theith cluster.

To assess the performance of DBSCAN, the purity of eachof the clusters was computed using Equation 4. The resultson the VidTIMIT and the MOBIO clusters are shown inFigure 3(b). As before, each bar represents the average purityover 10 trials with the corresponding number of subjects inthe video stream.

For each dataset and for all the subject combinationsused, it is seen that the purity is very close to its maximumvalue 1. Thus, each of the individual clusters, to a largeextent, contains images of a single subject only. Therefore,in addition to matching the number of clusters and theactual number of subjects, DBSCAN also isolates theimages of different subjects into separate clusters. Thisvalidates the effectiveness of DBSCAN as a mechanism todynamically select the batch size.

Experiment 2: The purpose of this experiment was toshow the advantage of selecting the batch size dynami-cally for a given video stream for applications like face

19

(a) Performance of dynamic batch size on the VidTIMITdataset

(b) Performance of dynamic batch size on the MOBIO dataset

Figure 4. Dynamic Batch Selection on the VidTIMIT and MOBIO datasets.

(a) Comparison of static and dynamic batch size on theVidTIMIT dataset

(b) Comparison of static and dynamic batch size on theMOBIO dataset

Figure 5. Comparison of Static vs. Dynamic Batch Selection on the VidTIMIT and MOBIO datasets.

recognition. To depict this, a classifier was induced with1 training video of each of 25 subjects. 100 video streamswere then presented to the learner, where the number ofsubjects in each varied between 1 and 10. For each videostream, Equation 2 was used to decide the batch size (thevalue of C was taken as 50) and the result of Equation3 was used to select batches of images from each clusterin the unlabeled stream being analyzed. The training setwas updated with the selected images and the classifier wastested on test videos containing the same subject(s) as in thecorresponding unlabeled stream. As before, the process wasrepeated over 10 trials for each number of subjects.

To illustrate the potential of dynamic batch size, theaccuracy obtained on test videos was compared against theaccuracy when all the frames in the unlabeled stream wereused to update the classification model, as well as when thebatch size was static and predetermined. The static batchsize was taken as 10 (the effect of this parameter is discussedlater in this section).

The results of this experiment are shown in Figure 4. Each

bar represents the mean accuracy obtained on test videosover 10 trials with the corresponding number of subjects toremove any subjectwise bias. It is seen that for both thedatasets, the mean accuracy obtained using dynamic batchselection is very close to that obtained when all the frameswere used for learning. This shows the efficiency of thedynamic batch selection framework in accurately identifyingthe batch size for a given video so that the resulting modelis comparable to the one obtained when trained on all theframes. The graphs also depict that the accuracy valuesobtained using dynamic selection is much better than whenthe batch size is decided apriori.

In general, we can expect that greater the number ofimages we select from a batch, greater will be the accuracyof the trained learner on a test set containing the samesubjects. Thus, if we select a greater value of the staticbatch size, it is expected to perform better than what isdepicted in Figure 4. Figure 5 supports this statement wherethe static batch size was selected as 80 instead of 10.We note that the static batch selection strategy performs

20

marginally better than dynamic selection and is closer tothe accuracy obtained when all frames are used for learning.However, to achieve this marginal improvement in accuracy,the static batch selection framework required a significantlylarger number of frames to be labeled. Figure 6 representsa comparative analysis of the number of frames that hadto be labeled between the dynamic selection strategy andthe static framework with batch size 80. The x axis denotesthe number of subjects in the video stream and the y axisrepresents the average percentage increment in the numberof frames that had to be labeled. The result shows thatfor both the datasets and for each number of subjects inthe video stream, the static framework required much morelabeled frames to marginally outweigh dynamic selection.

Figure 6. Mean increment in labeling cost using static selection with batchsize 80 as compared to dynamic selection

Therefore, for a given unlabeled video stream, the dy-namic batch selection framework proposed in this paperprovides a mathematical basis to decide the batch size basedon the quality of the images in the given video stream. Thestatic framework, on the contrary, requires the batch size tobe selected at random without any knowledge of the videostream in question. As demonstrated by our experiments, insome cases, it can select too less frames and consequentlyattain poor accuracy values while sometimes it may selecttoo many frames at a considerable labeling cost to achievean insignificant increment in accuracy.

Experiment 3: In addition to the above experiments,we also studied the performance of the optimization basedBMAL strategy in comparison to other heuristic BMALtechniques. To study this, a classifier was induced on 1 train-ing video of each of the 25 subjects. 100 unlabeled videostreams (with varying number of subjects in each) werepresented to the classifer one after another. For each stream,the batch size was dynamically selected and optimizationbased BMAL was applied to select a batch of images.The selected images were appended to the training set, theclassifier updated and tested on a test video containing 4500images spanning all the 25 subjects. The objective was tostudy the growth in accuracy on the same test video with

increasing size of the training set.The proposed optimization based approach was compared

with three other BMAL schemes - (i) Random Sampling,where a batch of points was randomly queried from theunlabeled video (ii) SVM Active Learning with Angular Di-versity, where a batch of points was incrementally sampledsuch that at each step the hyperplane induced by the selectedpoint maximizes the angle with all the hyperplanes of thealready selected points, as proposed by Brinker [3] (thismethod incrementally selects points by maximizing angulardistance at each step, rather than optimizing a global costfunction, as in our formulation) and (iii) Uncertainty BasedRanked Selection, where the top k uncertain points werequeried from the unlabeled video, k being the batch size.

For each video stream, the computed batch size was notedand this value was used for the corresponding unlabeledvideo in each of the heuristic techniques, to keep compar-isons fair. The results are shown in Figure 7. As the x axisof the graphs indicate, with every new video stream thatenters the system, the performance of the classifier improvesover time. It is noted that the proposed optimization basedframework performs much better than the other methods asits accuracy on the test set grows at the fastest rate. Thelabel complexity (the number of labeled examples needed toachieve a certain accuracy) is least in case of the proposedtechnique.

V. DISCUSSION AND FUTURE WORK

In this paper, we proposed a novel strategy to dynamicallydecide the batch size in a batch mode active learningapplication. We exploited clustering based unsupervisedlearning techniques which provided a reliable estimate ofthe number of subjects in the video stream; the batch sizewas then computed from the cluster structure of the data. Theexperimental results corroborated the superiority of dynamicbatch selection over static selection (in terms of accuracyand labeling cost). We also presented a novel optimizationbased BMAL approach to select data instances once thebatch size has been determined. The experimental studyconfirmed the efficacy of this method in obtaining greatergeneralization accuracy on unseen test videos as comparedto other heuristic methods.

In this work, the batch size was computed from theSilhouette coefficient and the cluster size. We plan to ex-plore other methods of computing the batch size from theclustering parameters, in our ongoing work. We also intendto study other possible methods of dynamically selectingthe batch size - for example, incorporating it as one of thevariables in the optimization problem. Having validated theusefulness of dynamic batch selection, we plan to focus ontime complexity and scaling issues in our future work.

21

(a) Batch Mode Active Learning on the VidTIMIT dataset (b) Batch Mode Active Learning on the MOBIO dataset

Figure 7. Performance of different batch mode active learning strategies on the VidTIMIT and MOBIO datasets. Test set accuracy is measured as thenumber of correct predictions on the test set as a percentage of the total number of images in the same set.

REFERENCES

[1] S. Hoi, R. Jin, and M. Lyu, “Batch mode active learning withapplications to text categorization and image retrieval,” IEEETKDE, 2009.

[2] S. C. H. Hoi, R. Jin, J. Zhu, and M. R. Lyu, “Batchmode active learning and its application to medical imageclassification,” in ICML, 2006.

[3] K. Brinker, “Incorporating diversity in active learning withsupport vector machines,” ICML, 2003.

[4] R. Hewitt and S. Belongie, “Active learning in face recogni-tion: Using tracking to build a face model,” in IEEE CVPR,2006.

[5] V. Balasubramanian, S. Chakraborty, and S. Panchanathan,“Generalized query by transduction for online active learn-ing,” in OLCV Workshop at ICCV, 2009.

[6] A. Kapoor, G. Hua, A. Akbarzadeh, and S. Baker, “Whichfaces to tag : Adding prior constraints into active learning,”in ICCV, 2009.

[7] C. Monteleoni and M. Kaariainen, “Practical online activelearning for classification,” in IEEE CVPR, 2007.

[8] S. Ho and H. Wechsler, “Query by transduction,” IEEETPAMI, 2008.

[9] S. Tong and D. Koller, “Support vector machine activelearning with applications to text classification,” in JMLR,2000.

[10] D. Cohn, Z. Ghahramani, and M. Jordan, “Active learningwith statistical models,” JAIR, 1996.

[11] Y. Freund, S. Seung, E. Shamir, and N. Tishby, “Selectivesampling using the query by committee algorithm,” MachineLearning., 1997.

[12] R. Liere and P. Tadepalli, “Active learning with committeesfor text categorization,” ICAI, 1997.

[13] Y. Baram, R. El-Yaniv, and K. Luz, “Online choice of activelearning algorithms,” JMLR, vol. 5, 2004.

[14] A. McCallum and K. Nigam, “Employing EM and Pool-Based active learning for text classification,” in ICML, 1998.

[15] S. C. H. Hoi, R. Jin, and M. R. Lyu, “Large-scale text cat-egorization by batch mode active learning,” in InternationalConference on World Wide Web. ACM, 2006.

[16] S. Hoi, R. Jin, J. Zhu, and M. Lyu, “Semi-supervised SVMbatch mode active learning for image retrieval,” in IEEECVPR, 2008.

[17] Y. Guo and D. Schuurmans, “Discriminative batch modeactive learning,” in NIPS, 2008.

[18] P. Tan, M. Steinbach, and V. Kumar, “Introduction to datamining,” 2006.

[19] Q. A. Wang, “Probability distribution and entropy as a mea-sure of uncertainty,” Journal of Physics A: Mathematical andTheoretical, vol. 41, 2008.

[20] J. Nocedal and S. J. Wright, Numerical optimization.Springer, 1999.

[21] C. Sanderson, Biometric Person Recognition: Face, Speechand Fusion. VDM Verlag, Jun. 2008.

[22] S. Marcel, C. McCool, P. Matejka, T. Ahonen, and J. Cer-nocky, “Mobile biometry (mobio) face and speaker verifica-tion evaluation,” Idiap Research Institute, Technical Report,2010.

[23] H. Ekenel, M. Fischer, Q. Jin, and R. Stiefelhagen, “Multi-modal person identification in a smart environment,” in IEEECVPR, 2007.

[24] J. Y. Kim, D. Y. Ko, and S. Y. Na, “Implementation andenhancement of GMM face recognition systems using flatnessmeasure,” in Robot and Human Interactive Communication,2004.

[25] C. M. Bishop, Pattern Recognition and Machine Learning,1st ed. Springer, Oct. 2007.

22