8
Semi-supervised Spectral Clustering for Image Set Classification Arif Mahmood, Ajmal Mian, Robyn Owens School of Computer Science and Software Engineering, The University of Western Australia {arif.mahmood, ajmal.mian, robyn.owens}@ uwa.edu.au Abstract We present an image set classification algorithm based on unsupervised clustering of labeled training and unla- beled test data where labels are only used in the stopping criterion. The probability distribution of each class over the set of clusters is used to define a true set based simi- larity measure. To this end, we propose an iterative sparse spectral clustering algorithm. In each iteration, a proxim- ity matrix is efficiently recomputed to better represent the local subspace structure. Initial clusters capture the global data structure and finer clusters at the later stages capture the subtle class differences not visible at the global scale. Image sets are compactly represented with multiple Grass- mannian manifolds which are subsequently embedded in Euclidean space with the proposed spectral clustering al- gorithm. We also propose an efficient eigenvector solver which not only reduces the computational cost of spectral clustering by many folds but also improves the clustering quality and final classification results. Experiments on five standard datasets and comparison with seven existing tech- niques show the efficacy of our algorithm. 1. Introduction Image set based object classification has recently re- ceived significant research interest [1, 2, 5, 10, 12, 17, 20, 28, 29, 30] due to its higher potential for accuracy and ro- bustness compared to single image based approaches. An image set contains more appearance details such as multi- ple views, illumination variations and time lapsed changes. These variations complement each other resulting in a bet- ter object representation. However, set based classification also introduces many new challenges. For example, in the presence of large intra set variations, efficient representa- tion turns out to be a difficult problem. Considering face recognition, it is well known that the images of different identities in the same pose are more similar compared to the images of the same identity in different poses (Fig. 1). Other important challenges include efficient utilization of all available information in a set to exploit intra-set similar- ities and inter-set dissimilarities. Many existing image set classification techniques are -0.5 0 0.5 -0.2 0 0.2 -0.2 0 0.2 0.4 0.6 PC 2 Figure 1. Projection on the top 3 principal directions of 5 subjects (in different colors) from the CMU Mobo face dataset. The natural data clusters do not follow the class labels and the underlying face subspaces are neither independent nor disjoint. variants of the nearest neighbor (NN) algorithm where the NN distance is measured under some constraint such as rep- resenting sets with affine or convex hulls [8], regularized affine hull [3], or using the sparsity constraint to find the nearest points between image sets [30]. Since NN tech- niques utilize only a small part of the available data, they are more vulnerable to outliers. At the other end of the spectrum are algorithms that rep- resent the holistic set structure, generally as a linear sub- space, and compute similarity as canonical correlations or principle angles [26]. However, the global structure may be a non-linear complex manifold and representing it with a single subspace will lead to incorrect classification [2]. Dis- criminant analysis has been used to force the class bound- aries by finding a space where each class is more compact while different classes are apart. Due to multi-modal na- ture of the sets, such an optimization may not scale the inter class distances appropriately (Fig. 1). In the middle of the spectrum are algorithms [2, 24, 28] that divide an image set into multiple local clusters (local subspaces or manifolds) and measure cluster-to-cluster distance [24, 28]. Chen et 1

Semi-supervised Spectral Clustering for Image Set ... - cv …€¦ · Semi-supervised Spectral Clustering for Image Set Classification Arif Mahmood, Ajmal Mian, Robyn Owens School

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Semi-supervised Spectral Clustering for Image Set ... - cv …€¦ · Semi-supervised Spectral Clustering for Image Set Classification Arif Mahmood, Ajmal Mian, Robyn Owens School

Semi-supervised Spectral Clustering for Image Set Classification

Arif Mahmood, Ajmal Mian, Robyn OwensSchool of Computer Science and Software Engineering, The University of Western Australia

{arif.mahmood, ajmal.mian, robyn.owens}@ uwa.edu.au

AbstractWe present an image set classification algorithm based

on unsupervised clustering of labeled training and unla-beled test data where labels are only used in the stoppingcriterion. The probability distribution of each class overthe set of clusters is used to define a true set based simi-larity measure. To this end, we propose an iterative sparsespectral clustering algorithm. In each iteration, a proxim-ity matrix is efficiently recomputed to better represent thelocal subspace structure. Initial clusters capture the globaldata structure and finer clusters at the later stages capturethe subtle class differences not visible at the global scale.Image sets are compactly represented with multiple Grass-mannian manifolds which are subsequently embedded inEuclidean space with the proposed spectral clustering al-gorithm. We also propose an efficient eigenvector solverwhich not only reduces the computational cost of spectralclustering by many folds but also improves the clusteringquality and final classification results. Experiments on fivestandard datasets and comparison with seven existing tech-niques show the efficacy of our algorithm.

1. IntroductionImage set based object classification has recently re-

ceived significant research interest [1, 2, 5, 10, 12, 17, 20,28, 29, 30] due to its higher potential for accuracy and ro-bustness compared to single image based approaches. Animage set contains more appearance details such as multi-ple views, illumination variations and time lapsed changes.These variations complement each other resulting in a bet-ter object representation. However, set based classificationalso introduces many new challenges. For example, in thepresence of large intra set variations, efficient representa-tion turns out to be a difficult problem. Considering facerecognition, it is well known that the images of differentidentities in the same pose are more similar compared tothe images of the same identity in different poses (Fig. 1).Other important challenges include efficient utilization ofall available information in a set to exploit intra-set similar-ities and inter-set dissimilarities.

Many existing image set classification techniques are

-0.5

0

0.5

-0.2

0

0.2

-0.2

0

0.2

0.4

0.6

PC

2

Figure 1. Projection on the top 3 principal directions of 5 subjects(in different colors) from the CMU Mobo face dataset. The naturaldata clusters do not follow the class labels and the underlying facesubspaces are neither independent nor disjoint.

variants of the nearest neighbor (NN) algorithm where theNN distance is measured under some constraint such as rep-resenting sets with affine or convex hulls [8], regularizedaffine hull [3], or using the sparsity constraint to find thenearest points between image sets [30]. Since NN tech-niques utilize only a small part of the available data, theyare more vulnerable to outliers.

At the other end of the spectrum are algorithms that rep-resent the holistic set structure, generally as a linear sub-space, and compute similarity as canonical correlations orprinciple angles [26]. However, the global structure maybe a non-linear complex manifold and representing it with asingle subspace will lead to incorrect classification [2]. Dis-criminant analysis has been used to force the class bound-aries by finding a space where each class is more compactwhile different classes are apart. Due to multi-modal na-ture of the sets, such an optimization may not scale the interclass distances appropriately (Fig. 1). In the middle of thespectrum are algorithms [2, 24, 28] that divide an image setinto multiple local clusters (local subspaces or manifolds)and measure cluster-to-cluster distance [24, 28]. Chen et

1

Page 2: Semi-supervised Spectral Clustering for Image Set ... - cv …€¦ · Semi-supervised Spectral Clustering for Image Set Classification Arif Mahmood, Ajmal Mian, Robyn Owens School

(c) All Subspace Basis

Class Labels

(d) Iterative Sparse Spectral Clustering

Stop Clustering

No

Yes

(f) Pr(Class|Cluster)

(g) Distribution Distance

SLS

Fast Eigen Solver

Cluster Labels

Final Cluster Labels (b) Grassmann Manifolds (a)

(e)

Figure 2. Proposed algorithm:(a) Face manifolds of 4 subjects from CMU Mobo dataset in PCA space. (b) Each set transformed to aGrassmannian manifold. (c) Data matrix and class label vector. (d) Iterative Sparse Spectral Clustering Algorithm: Proximity matrix basedon Sparse Least Squares, a novel fast Eigenvector solver and supervised termination criterion. (e) Each class element assigned a latentcluster label. (f) Probability distribution of each class over the set of clusters. (g) Class probability distribution based distance measure.

al. [2] achieved improved performance by computing thedistance between different locally linear subspaces. In allthese techniques, classification is dominated by either a fewdata points, only one cluster, one local subspace, or one ba-sis of the global set structure, while the rest of the image setdata or local structure variations are ignored.

We propose a new framework in which the final classifi-cation decision is based on all data/clusters/subspaces con-tained in all image sets. Therefore, classification decisionsare global compared to the existing local decisions. Forthis purpose, we apply unsupervised clustering on all datacontained in the gallery and the probe sets, without enforc-ing set boundaries. We find natural data clusters based onthe true data characteristics without using labels. Set labelsare used only to determine the required number of clustersi.e. in the termination criterion. The probability distributionof each class over the set of natural data clusters is com-puted. Classification is performed by measuring distancesbetween the probability distributions of different classes.

The proposed framework is generic and applicable to anyunsupervised clustering algorithm. However, we propose animproved version of the Sparse Subspace Clustering (SSC)algorithm [6] for our framework. SSC yields good resultsif different classes span independent or disjoint subspacesand is not directly applicable to the image-set classificationproblem where the subspaces of different classes are nei-ther independent nor disjoint. We propose various improve-ments over the SSC algorithm. We obtain the proximitymatrix by sparse least squares with more emphasis on thereconstruction error minimization than the sparsity induc-tion. We perform clustering iteratively and in each iterationdivide a parent cluster into small number of child clustersusing the NCut objective function. Coarser clusters cap-ture the global data variations while the finer clusters, in thelater iterations, capture the subtle local variations not visi-ble at the global level. This coarse to fine scheme allowsdiscrimination between globally similar data elements.

The proposed clustering algorithm can be directly usedwith the set samples however, we represent each set with aGrassmannian manifold and perform clustering on the man-ifold basis matrices. This strategy reduces computationalcomplexity and increases discrimination and robustness todifferent types of noise in the image sets. Using Grass-mann manifolds, we make an ensemble of spectral classi-fiers which further increases accuracy and gives reliability(confidence) of the label assignment.

Our final contribution is a fast eigenvector solver basedon the group power iterations method. The proposed algo-rithm iteratively finds all eigenvectors simultaneously andterminates when the signs of the required eigenvector co-efficients become stable. This is many times faster thanthe Matlab SVDS implementation and yields clusters ofthe same or better quality. Experiments are performed onthree standard face image-sets (Honda, Mobo and YoutubeCelebrities), an object categorization (ETH 80), and Cam-bridge hand gesture datasets. Results are compared to sevenstate of the art algorithms. The proposed technique achievesthe highest accuracy on all datasets. The maximum im-provement is observed on the most challenging Youtubedataset where our algorithm achieves 11.4% higher accu-racy than the best reported.

2. Image Set Classification by Semi-supervisedClustering

Most image-set classification algorithms try to enforcethe set boundaries in a supervised way i.e. label assignmentto the training data is often manual and based on the realworld semantics or prior information instead of the under-lying data characteristics. For example, all images of thesame subject are pre-assigned the same label despite thatthe intra subject dissimilarity may exceed that of inter sub-ject. Therefore, the intrinsic data clusters may not align wellwith the imposed class boundaries.

Page 3: Semi-supervised Spectral Clustering for Image Set ... - cv …€¦ · Semi-supervised Spectral Clustering for Image Set Classification Arif Mahmood, Ajmal Mian, Robyn Owens School

Wang et. al. [24, 28] computed multiple local clusters(sub-sets) within the set boundaries. Subsets are individu-ally matched and a gallery class containing a subset max-imally similar to a query subset becomes the label winner.This may address the problem of within set dissimilarity butmost samples do not play any role in the label estimation.

We propose to compute the set similarities based on theprobability distribution of each data-set over an exhaustivenumber of natural data clusters. We propose unsupervisedclustering to be performed on all data, the probe and all thetraining sets combined without considering labels. By do-ing so, we get natural clusters based on the inherent datacharacteristics. Clusters are allowed to be formed acrosstwo or more gallery classes. Once an appropriate number ofclusters are obtained, we use labels to compute class proba-bility distribution over the set of clusters.

Let nk be the number of natural clusters and nc be thenumber of gallery classes. For a class ci having ni datapoints, let pi ∈ Rnk be the distribution over all clusters:∑nk

k=1 pi[k] = 1 and 1 ≥ pi[k] = ni[k]/ni ≥ 0, where ni[k]are the data points of class ci in the k-th cluster. Our ba-sic framework does not put any condition on nk however,we argue by Lemma 2.1 that an optimal number of clustersexist and can be found for the task of set label estimation.The derivation of Lemma 2.1 is based on the notion of ‘con-ditional orthogonality’ and ‘indivisibility’ of clusters as de-fined below. We assume that classes ci and cj belong to thegallery (G) with known labels while cp is the probe set withunknown label.

Conditional Orthogonality Distribution of class ci is con-ditionally orthogonal to the distribution of a class cj w.r.t thedistribution of probe set cp if

(pi ⊥c pj)pp∶= ⟪(pi ∧ pp), (pj ∧ pp)⟫ = 0 ∀(ci, cj) ∈ G,

(1)where ∧ is the logical AND operation and ⟪⋅⟫ is the innerproduct. If both operands of ∧ are non-zero then the resultwill be 1 otherwise result will be 0.

Indivisible Cluster A cluster k∗ is indivisible from theprobe set label estimation perspective if

pi[k∗] ∧ pj[k

∗] ∧ pp[k

∗] = 0 ∀(ci, cj) ∈ G. (2)

A cluster is ‘indivisible’ if either exactly one gallery classhas non zero probability over that cluster or the probabilityof probe set is zero.

Lemma 2.1 Optimal number of clusters for the set labelingproblem are the minimum number of clusters such that allgallery class distributions become orthogonal to each otherw.r.t the probe set distribution.

n∗k ≜ minnk

(pi ⊥c pj)pp ∀ci, cj ∈ G (3)

We only discuss an informal proof of Lemma 2.1. The con-dition (3) ensures all clusters are indivisible, therefore in-creasing the number of clusters beyond n∗k will not yieldmore discrimination. For nk < n∗k, there will be some clus-ters having overlap between class distributions, hence re-ducing the discrimination. Thus n∗k are the optimal numberof clusters for the task at hand.

Once the optimality condition is satisfied, all clusterswill be indivisible. Existing measures may then be usedfor the computation of distances between class-cluster dis-tributions. For this purpose, we consider Bhattacharyya andHellinger distance measures. We also propose a modifiedBhattacharyya distance which we empirically found morediscriminative than the existing measures.Bhattacharyya distance (Bi,p) between a class ci ∈ G andthe probe-set cp, having pi(k) and pp(k) probability overthe k-th cluster is

Bi,p = − lnn∗k

k=1

pi(k)pp(k). (4)

In (4), ln(0) ∶= 0, therefore 0 ≤ Bi,p ≤ 1.Hellinger distance is the `2 norm distance between two

probability distributions

Hi,p =1

2

¿

ÁÁÁÀ

n∗k

k=1

(

pi[k] −√

pp[k])2, (5)

Modified Bhattacharyya Distance (BMi,p) of gallery classci with probe class cp is given by

BMi,p = − ln⟪

√pi,

√pi ⋅ (pi ∧ pp)⟫⟪

pp,√

pp ⋅ (pi ∧ pp)⟫,(6)

where (⋅) in this definition is a point-wise multiplicationoperator and have precedence over inner product.

Lemma 2.2 Bhattacharyya distance is upper bounded bythe modified Bhattacharyya distance: BMi,p ≥ Bi,p

Proof simply follows from Cauchy Schwartz inequality:

BMi,p ≥ − ln⟪

√pi,

pp⟫.

At the extreme cases, when the angle between the two dis-tributions is 0 or 90o, BMi,p = Bi,p, while for all other casesBMi,p > Bi,p. Note that the factors forced to be zero in BMi,p

by introducing the ∧ operator are automatically canceledin Bi,p. Performance of the three measures was experimen-tally compared and we observed that BMi,p achieves the high-est accuracy.

3. Spectral ClusteringThe basic idea is to divide data points into different clus-

ters using the spectrum of proximity matrix which repre-

Page 4: Semi-supervised Spectral Clustering for Image Set ... - cv …€¦ · Semi-supervised Spectral Clustering for Image Set Classification Arif Mahmood, Ajmal Mian, Robyn Owens School

sents an undirected weighted graph. Each data point cor-responds to a vertex and edge weights correspond to sim-ilarity between the two points. Let G = {Xi}

gi=1 ∈ R

l×ng

be the collection of data points in the gallery sets. Here,ng = ∑

gi=1 ni are the total number of data points in the

gallery. The i-th image set has ni data points each of dimen-sion l. The gallery contains g sets and nc classes: g ≥ nc,Xi = {xj}

ni

j=1 ∈ Rl×ni . Each data point xj could be a fea-

ture vector or simply the pixel values.Let cp = {xi}

np

i=1 ∈Rl×np be the probe-set with a dummy

label nc+1. We make a global data matrix by appending allgallery sets and the probe set: D = [G cp] ∈ R

l×nd , wherend = ng + np. The affinity matrix A ∈ R

nd×nd is computedas

Ai,j =

⎧⎪⎪⎨⎪⎪⎩

exp−∣∣xi−xj ∣∣22

2σ2 if i ≠ j0 if i = j.

(7)

From A, a degree matrix D is computed

D(i, j) =

⎧⎪⎪⎨⎪⎪⎩

∑nd

i=1A(i, j) if i = j0 if i ≠ j,

(8)

Using A and D, a Laplacian matrix L is computed

Lw =D−1/2AD−1/2. (9)

Let E = {ei}nc

i=1 be the matrix of nc smallest eigenvectorsof Lw. The eigenvectors of the Laplacian matrix embed thegraph vertices into a Euclidean space where NN approachcan be used for clustering. Therefore, the rows ofE are unitnormalized and grouped into nc clusters using kNN.

3.1. Proximity Matrix As Sparse Least Squares

Often high dimensional data sets lie on low dimensionalmanifolds. In such cases, the Euclidean distance basedproximity matrix is not an effective way to represent the ge-ometric relationships among the data points. A more viableoption is the sparse representation of data which has beenused for many tasks including label propagation [4], dimen-sionality reduction, image segmentation and face recogni-tion [11]. Alhamifar and Vidal [6] have recently proposedsparse subspace clustering which can discriminate data ly-ing on independent and disjoint subspaces.

A vector can only be represented as a linear combinationof other vectors spanning the same subspace. Therefore, theproximity matrices based on linear decomposition of datapoints lead to subspace based clustering. Representing adata point xi as a linear combination of the remaining datapoints D̂ = D/xi ensures that zero coefficients will onlycorrespond to the points spanning different subspaces. Sucha decomposition can be computed with least squares

αi = (D̂⊺D̂)

−1D̂⊺xi, (10)

where α are the linear coefficients. We are mainly con-cerned with the face space which is neither independent nordisjoint across different subjects. Therefore, introducing asparsity constraint on α ensures that the linear coefficientsfrom less relevant subspaces will be forced to zero:

α∗i ∶= minαi

(∣∣xi − D̂αi∣∣22 + λ∣∣αi∣∣1) . (11)

The first term minimizes the reconstruction error while thesecond term induces sparsity. This process is repeated forall data points and the corresponding αi are appended ascolumns in a matrix S = {αi}

nd

i=1 ∈ Rnd×nd . Some of

the α coefficients may be negative and in general S(i, j) ≠S(j, i). Therefore, a symmetric sparse LS proximity matrixis computed as A = ∣S∣ + ∣S⊺∣ for spectral clustering.

3.2. Iterative Hierarchical Sparse Spectral Clusters

Conventionally, sparse spectral clustering is performedby simultaneous partitioning of the graph into nk clus-ters [6]. We argue that iterative hierarchical clustering hasmany advantages. In each iteration, we divide the graphinto very few partitions/clusters (four in our implementa-tion). If a cluster is not indivisible (2), we recompute thelocal sparse LS based proximity matrix only for that clus-ter and then re-compute the eigenvectors. Note that we donot reuse a part of the initial proximity matrix, because thesparse LS gives a different matrix due to reduced numberof candidates. The new matrix highlights only local con-nectivity as opposed to global connectivity at the highestlevel. Coarser clusters capture the large global variationswhile the finer clusters, obtained later, capture the subtle in-ter class differences which may not be visible at the globalscale. As a result, we are able to locally differentiate be-tween data points which were globally similar. Moreover,as we explain next, in the case of simultaneous partition-ing, an approximation error accumulates which adverselyaffects the cluster quality, whereas the iterative hierarchicalapproach enables us to obtain high quality clusters.

Let al and ar be the two disjoint partitions. We wantto find a graph cut that minimizes the sum of edge weights∑i∈∣al∣,j∈∣ar ∣A(i, j) across the cut. MinCut is easy to imple-ment but it may give unbalanced partitions. In the extremecase, minCut may separate only one node from the remain-ing graph. To ensure balanced partitions, we minimize thenormalized cut NCut objective function [25, 7]

1

Val∑

i∈∣al∣,j∈∣ar ∣A(i, j) +

1

Var∑

i∈∣al∣,j∈∣ar ∣A(i, j), (12)

where Val is the sum of all edge weights attached to thevertices in al. The objective function increases with the de-crease in the number of nodes in a partition. Unfortunately,the minCut using the NCut objective function is NP hardand only approximate solutions can be computed using the

Page 5: Semi-supervised Spectral Clustering for Image Set ... - cv …€¦ · Semi-supervised Spectral Clustering for Image Set Classification Arif Mahmood, Ajmal Mian, Robyn Owens School

spectral embedding approach. It has been shown in [25]that the eigenvector corresponding to the second smallesteigenvalue (e2) of

Lsym =D−1/2(D −A)D−1/2 (13)

provides an approximate solution to a relaxed NCut prob-lem. All data points corresponding to e2(i) ≥ 0 are assignedto al while the remaining ones to ar. Similarly, e3 canfurther divide each cluster into two more clusters and thehigher order eigenvectors can be used for further partition-ing the graph into smaller clusters. However, the approxi-mation error accumulates degrading the clustering quality.

Although good quality clusters can be obtained using theiterative approach, it requires more computations becausethe proximity matrix and eigenvector computations mustbe repeated at each iteration. If the data matrix D is di-vided into four balanced partitions each time, the size of thenew matrices are 16 times smaller. Since the complexity ofeigenvector solvers is O(n3

d), the complexity reduction inthe next iteration is O((

nd

4)3)) for each sub-problem. The

depth of the recursive tree is log4(nd), however the pro-posed supervised stopping criteria does not let the iterationsto continue until the end, rather the process stops as soonas all clusters are indivisible. A computational complexityanalysis of the recursion tree reveals that the overall com-plexity of the eigenvector computations remains O(n3

d).

4. Computational Cost ReductionWe propose two approaches for computational complex-

ity reduction of spectral clustering. The first approach re-duces the data by embedding each image set on a Grass-mann manifold and then using the manifold basis vectors torepresent the set. The second approach is a fast eigenvec-tor solver which quickly computes approximate eigenvec-tors using an early termination criterion of sign changes.

4.1. Data Reduction by Grassmann Manifolds

Eigenvectors computation of large Laplacian matricesLsym ∈ R

nd×nd incurs high computational cost. Oftena part of the data matrix D or some columns from Lsymare sampled and eigenvectors are computed for the sampleddata and extrapolated for the rest [7, 22, 18]. Approximateapproaches provide sufficient accuracy for computing themost significant eigenvectors but are not as accurate for theleast significant eigenvectors which are actually required inspectral clustering. In contrast, we propose to representeach set by a compact representation and clustering to beperformed on the representation instead of the original data.

Our choice of image set representation is motivated fromlinear subspace based image-set representations [28, 26].These subspaces may be considered as points on Grassman-nian manifolds [9, 10]. While others performed discrimi-nant analysis on the Grassmannian manifolds or computed

manifold to manifold distances, we perform sparse spectralclustering on the Grassmannian manifolds.

A set of λ-dimensional linear subspaces of Rn, n =

min(l, nj) and λ ≤ n, is termed the Grassmann mani-fold Grass(λ,n). An element Y of Grass(λ,n) is a λ-dimensional subspace which can be specified by a set of λvectors: Y = {y1, ..., yλ} ∈ R

l×λ and Y is the set of alltheir linear combinations. For each data element of the im-age set, we compute a set of basis Y and the set of all suchY matrices is termed as the non-compact Stiefel manifoldST (λ,n) ∶= {Y ∈ Rl×λ ∶ rank (Y) = λ.}. We arrangeall the Y matrices in a basis matrix B which is capable ofrepresenting each data point in the image set by just usingλ of its columns. For the i-th data point in j-th image setxij ∈ Xj , having Bj as the basis matrix, xij = Bjα

ij , where

αij is the set of linear parameters with ∣αij ∣o = λ. For the caseof known Bj , we can find a matrix αj = {α1

j , α2j ,⋯α

ni

j }

such that the residue approaches zero

minαj

(

ni

i=1

∣∣xij −Bjαji ∣∣

22) s.t. ∣∣αj ∣∣o ≤ λ . (14)

Since `1 can approximate `o, we can estimate both αj andBj iteratively by using the following objective function [16]

minαj ,Bj

(

1

nj

nj

i=1

1

2∣∣xij −Bjα

ij ∣∣

22 s.t. ∣∣αij ∣∣1 ≤ λ) . (15)

The solution is obtained by randomly initializing Bj andcomputing αj , then fixing αj and computing Bj . The sizeof Bj ∈ Rl×Λ is significantly smaller than the correspond-ing image set Xj∈R

l×nj . Representing each image set withΛ basis vectors reduces the data matrix from nd × nd toΛg×Λg, which significantly reduces the computational cost.Additionally, we observe that this representation also pro-vides significant increase in the accuracy of the proposedclustering algorithm because the underlying subspaces ofeach class are robustly captured in B leaving out noise.

4.2. Fast Approximate Eigenvector Solver

The size of Laplacian matrix still grows with the numberof subjects. To reduce the cost of eigenvector computations,we propose a fast group power iteration method which findsall the eigenvectors and eigenvalues simultaneously, givensufficient number of iterations. The proposed algorithm al-ways converges and has good numerical stability.

The signs of the e2 eigenvector coefficients (correspond-ing to the minimum non-zero eigenvalue) of Lsym providesan approximate solution to a relaxed NCut problem [25].Therefore, if we get the correct signs with approximatemagnitude of the eigenvectors, we will still get the samequality of clusters. We exploit this fact in our proposedeigenvector solver and terminate the iterations when thenumber of sign changes falls below a threshold.

Page 6: Semi-supervised Spectral Clustering for Image Set ... - cv …€¦ · Semi-supervised Spectral Clustering for Image Set Classification Arif Mahmood, Ajmal Mian, Robyn Owens School

Power iterations algorithm is a fundamental techniquefor eigenvalues and eigenvectors computation [31]. A ran-dom vector v is repeatedly multiplied by the matrix L andnormalized. After k iterationsAv(k) = λ v(k), where v(k) isthe most dominant eigenvector of L and λ the correspond-ing eigenvalue. To calculate the next eigenvector, the sameprocess is repeated on the deflated matrix L.

Group power iteration method can be used for the com-putation of all eigenvectors simultaneously. A random ma-trix Vr is iteratively multiplied by L. After each multiplica-tion, a unit normalization step is required by division withV ⊺r Vr which converges to a diagonal matrix as Vr converges

to an orthonormal matrix. To stop all columns from con-verging to the most dominant eigenvector, an orthogonal-ization step is also required. We use QR decomposition forthis purpose: V (k)r R

(k)r = V

(k)r . A simple group power it-

eration method will have poor convergence properties [31].To overcome this, we apply convergence from the left andthe right sides simultaneously by computing the left and theright eigenvectors (see Algorithm 1). We observe that Al-gorithm 1 always converges to the correct solution and hasbetter numerical properties than the group power iteration.

For the proposed spectral clustering, only the signs areimportant. Therefore, in Algorithm 1, we replace the eigen-value convergence based termination criterion with stabi-lization of sign changes criterion between consecutive iter-ations as follows

∆S =∑(V (k)r > 0)⊕ (V (k−1)r > 0), (16)

where ⊕ is the XOR operator. 0⊕1 = 1, 1⊕0 = 1, 0⊕0 = 0,1 ⊕ 1 = 0. We empirically found that most of the signsbecome stable after very few iterations (≤ 4).

5. Ensemble of Spectral ClassifiersRepresenting image sets with Grassmannian manifolds

facilitates formulation of an ensemble of spectral classifiers.Different random initializations ofB0

j in (15) may convergeto different solutions resulting in multiple image set repre-sentations. In addition to that, we also vary the dimension-ality of manifolds and compute a set of manifolds for eachclass. The proposed spectral classifier is independently ap-plied to the manifolds of the same dimensionality over allclasses and the inter class distances are estimated. We fusethe set of distances using mode fusion and also by sum rule.In mode fusion, the probe set label is estimated individu-ally for each classifier and the label with the maximum fre-quency is selected. In sum fusion, minimum distance overthe cumulative distance vector defines the probe set label.

6. Experimental EvaluationEvaluations are performed for image-set based face

recognition, object categorization and gesture recogni-tion. The SPAMS [21] package is used for sparse cod-

Algorithm 1 Fast Eigen Solver: Group Power IterationInput: L ∈R

n×n, εLOutput: U,V {Eigenvectors}, Λ {Eigenvalues}V(0)r = I(n,n) {Identity matrix}

Λ(0) = L, δΛ = 1while δΛ ≥ εL doV(k)l ← L⊺V (k−1)

r

V(k)l ← V

(k)l /(V

(k)⊺l V

(k)l )

V(k)l R

(k)l ← V

(k)l {left qr decomposition}

V(k)r ← V

(k)l

V(k)r ← V

(k)r /(V

(k)⊺r V

(k)r )

V(k)r R

(k)r ← V

(k)r {right qr decomposition}

Λ(k) ← V(k)r LV

(k)⊺l

δΛ = ∣∣diag(Λ(k) −Λ(k−1))∣∣2

end whileU ← V

(k)r , V ← V

(k)l

ing. Comparisons are performed with Discriminant Canon-ical Correlation (DCC) [26], Manifold-Manifold Distance(MMD) [28], Manifold Discriminant Analysis (MDA) [24],linear Affine and Convex Hull based Image Set Dis-tance (AHISD, CHISD) [8], Sparse Approximated Near-est Points [30], and Covariance Discriminative Learning(CDL) [27]. The same experimental protocol is used forall algorithms. The codes of [8, 26, 28, 30] were providedby the original authors.

6.1. Face Recognition using Image-sets

Our first dataset is the You-tube Celebrities [19] whichis very challenging and includes 1910 very low resolutionvideos (of 47 subjects) containing motion blur, high com-pression, pose and expression variations (Fig. 3c). Faceswere automatically detected, tracked and cropped to 30×30gray-scale images. Due to tracking failures, our sets con-tained fewer images (8 to 400 per set) than the total numberof video frames. The proposed algorithm performed best onHOG features. Five-fold cross validation experiments wereperformed where 3 image sets were randomly selected fortraining and the remaining 6 for testing.

Each image-set was represented by two manifold-sets byusing λ = {1,2} in (15). Each set has 8 classifiers with di-mensionality increasing from 1 to 8. Recognition rates ofthe ensembles of each dimensionality are compared withboth fusion schemes in Fig. 4a. No-fusion case shows theaverage accuracy of the individual classifiers over the fivefolds. Note that mode fusion achieves the highest accuracyand performs better than sum fusion because error does notget accumulated. Hierarchical clustering is compared withone-step clustering in 4b which shows the superiority of thehierarchical approach. The performance of the proposedeigenvector solver is compared with the Matlab sparse SVD

Page 7: Semi-supervised Spectral Clustering for Image Set ... - cv …€¦ · Semi-supervised Spectral Clustering for Image Set Classification Arif Mahmood, Ajmal Mian, Robyn Owens School

(b) (a)

(c) (d)

Figure 3. Two example image-sets from (a) Honda, (b) CMUMobo and (c) You-tube Celebrities datasets. (d) ETH 80 dataset.

Youtube

Ajm code with Splits Ajm code Ajm code

Halingr

Bhattachar Energy Modified Bhattachar Energy

Fold 1 2 3 4 5 Mean Fold 1 2 3 4 5 Mean Fold 1 2 3 4 5 Mean

1 81.9149 76.9504 74.4681 77.305 76.2411 77.3759 82.6241 77.305 75.1773 77.305 76.2411 77.7305 81.9149 76.9504 74.4681 77.305 76.2411 77.3759

2 74.1135 70.2128 68.0851 69.1489 69.5035 70.2128 73.4043 70.5674 66.6667 68.0851 67.3759 69.2199 74.1135 70.2128 68.0851 69.1489 69.5035 70.2128

3 74.4681 70.2128 68.7943 68.4397 70.5674 70.4965 74.1135 71.6312 67.3759 67.0213 68.4397 69.7163 74.4681 70.2128 69.1489 68.4397 70.5674 70.5674

4 76.5957 72.695 69.8582 69.8582 72.695 72.3404 76.5957 74.4681 70.2128 68.7943 71.2766 72.2695 76.5957 72.695 69.8582 70.2128 72.695 72.4113

5 77.6596 73.0496 70.5674 70.5674 74.8227 73.3333 78.3688 74.4681 70.2128 69.8582 72.3404 73.0496 77.6596 73.4043 70.5674 70.5674 74.8227 73.4043

6 79.078 74.8227 70.922 71.9858 75.1773 74.3972 78.7234 76.5957 70.922 70.922 73.0496 74.0426 79.078 75.1773 70.922 71.9858 74.8227 74.3972

7 79.4326 76.2411 71.2766 71.9858 75.8865 74.9645 79.4326 77.6596 71.2766 70.2128 74.4681 74.6099 79.4326 76.2411 71.6312 71.9858 75.8865 75.0355

8 80.4965 77.305 71.9858 72.695 76.5957 75.8156 80.1418 79.4326 72.695 71.2766 76.5957 76.0284 80.4965 77.305 72.3404 72.695 76.5957 75.8865

Bhattachar Frequency Fold Halinger Frequency Fold Modified Bhattachar Frequency

Fold 1 2 3 4 5 Mean Fold 1 2 3 4 5 Mean Fold 1 2 3 4 5 Mean

81.9149 76.9504 74.4681 77.305 76.2411 77.3759 82.6241 77.305 75.1773 77.305 76.2411 77.7305 81.9149 76.9504 74.4681 77.305 76.2411 77.3759

77.6596 74.8227 72.3404 70.5674 76.2411 74.3262 77.6596 74.4681 72.3404 69.8582 73.7589 73.617 77.6596 74.8227 72.3404 70.5674 76.2411 74.3262

79.4326 78.3688 74.8227 71.9858 76.5957 76.2411 80.4965 79.078 73.7589 72.3404 74.1135 75.9574 79.4326 78.3688 74.8227 71.9858 76.5957 76.2411

80.1418 80.1418 74.1135 72.3404 79.4326 77.234 81.9149 80.1418 74.4681 71.9858 75.1773 76.7376 80.1418 80.1418 74.1135 72.3404 79.4326 77.234

80.8511 80.8511 74.8227 73.7589 78.0142 77.6596 80.4965 79.7872 74.8227 71.6312 75.8865 76.5248 80.8511 80.8511 74.8227 73.7589 78.0142 77.6596

80.8511 80.1418 74.4681 73.7589 78.3688 77.5177 80.8511 81.9149 73.7589 72.3404 76.9504 77.1631 80.8511 80.1418 74.4681 73.7589 78.3688 77.5177

81.9149 80.4965 74.4681 75.5319 78.0142 78.0851 81.5603 82.9787 74.4681 73.0496 78.0142 78.0142 81.9149 80.4965 74.4681 75.5319 78.0142 78.0851

82.6241 80.8511 74.4681 75.8865 78.3688 78.4397 81.9149 83.3333 75.8865 73.4043 77.305 78.3688 82.27 80.85 74.47 75.89 78.37 78.37

#NAME?

Time Fast Eigen solver

Fold 1 2 3 4 5 Mean

1 57.346 55.209 56.41 60.201 55.833 57

2 217.01 217.54 256.09 219.98 215.08 225.14

3 744.64 820.35 806.34 799.18 802.94 794.69

4 1439.6 1569.9 1509.4 1521.7 1558.7 1519.9

5 3435.3 3509.4 3515.9 3536.8 3615.8 3522.7

6 5554.7 5825.9 5644.8 5723.4 6033.7 5756.5

7 9463.8 10025 9634.7 10142 10411 9935.4

8 13426 14044 14014 14401 14675 14112

3.930806

52775

No ensemble 

Modified Bhattachar No ensembel

Fold 1 2 3 4 5 Mean

81.915 76.95 74.468 77.305 76.241 77.376

75.177 72.695 69.858 67.376 74.113 71.844

75.887 72.695 70.213 71.986 71.277 72.411

75.532 73.05 69.149 67.73 71.986 71.489

76.95 75.177 68.794 70.567 73.759 73.05

76.596 73.404 71.277 71.277 71.986 72.908

75.532 75.532 72.695 71.277 73.05 73.617

76.95 74.113 71.986 71.986 75.887 74.184

70

73

76

79

82

85

2 3 4 5 6 7 8

% Accuracy

Grassmann Manifold Dimensionality

Matlab SVDS

Fast Eigen Solver

(c)

128

10128

20128

30128

40128

50128

60128

2 3 4 5 6 7 8

Execution Tim

e (sec)

Grassmann Manifold Dimensionality

Fast Eigen Solver

Matlab SVDS

(d)

73

74

75

76

77

78

79

80

81

82

83

2 3 4 5 6 7 8

%  A

ccuracy

Grassmann Manifold Dimensionality 

Sum Rule FusionMode FusionNo Fusion

(a)

40

45

50

55

60

65

70

75

80

85

2 3 4 5 6 7 8

%  A

ccuracy

Grassmann Manifold Dimensionality

One‐Step Clustering

Hierarchical Clustering

(b)

(a)

Figure 4. You-tube dataset: (a) Comparison of different spectralensemble fusion schemes. (b) Comparison of one step and the it-erative clustering (No Fusion). (c-d) Accuracy and execution timecomparison of the proposed fast eigen-solver with Matlab SVDS.

Modified Bhattachar Energy

Fold 1 2 3 4 5 6 7 8 Mean

98.6111 100 97.2222 98.6111 97.2222 100 86.1111 97.2222 96.875

93.0556 94.4444 95.8333 94.4444 93.0556 94.4444 90.2778 88.8889 93.0556

91.6667 93.0556 93.0556 91.6667 93.0556 91.6667 87.5 84.7222 90.7986

91.6667 93.0556 93.0556 93.0556 91.6667 91.6667 91.6667 88.8889 91.8403

91.6667 90.2778 95.8333 93.0556 91.6667 93.0556 91.6667 88.8889 92.0139

93.0556 91.6667 95.8333 95.8333 93.0556 95.8333 95.8333 93.0556 94.2708

95.8333 91.6667 95.8333 97.2222 93.0556 95.8333 95.8333 93.0556 94.7917

97.2222 91.6667 95.8333 97.2222 93.0556 94.4444 95.8333 95.8333 95.1389

97.2222 91.6667 95.8333 95.8333 93.0556 94.4444 95.8333 95.8333 94.9653

95.8333 91.6667 95.8333 95.8333 93.0556 95.8333 95.8333 95.8333 94.9653

95.8333 93.0556 97.2222 95.8333 93.0556 97.2222 97.2222 95.8333 95.6597

95.8333 93.0556 97.2222 95.8333 94.4444 97.2222 97.2222 95.8333 95.8333

97.2222 93.0556 97.2222 95.8333 94.4444 97.2222 97.2222 95.8333 96.0069

97.2222 93.0556 95.8333 95.8333 94.4444 95.8333 97.2222 97.2222 95.8333

97.2222 94.4444 97.2222 95.8333 94.4444 95.8333 97.2222 97.2222 96.1806

97.2222 94.4444 97.2222 95.8333 94.4444 97.2222 97.2222 97.2222 96.3542

97.2222 94.4444 97.2222 95.8333 94.4444 95.8333 97.2222 97.2222 96.1806

95.8333 94.4444 97.2222 95.8333 94.4444 95.8333 97.2222 97.2222 96.0069

95.8333 94.4444 97.2222 95.8333 94.4444 95.8333 97.2222 97.2222 96.0069

95.8333 94.4444 97.2222 95.8333 94.4444 95.8333 97.2222 97.2222 96.0069

95.8333 94.4444 97.2222 95.8333 94.4444 95.8333 97.2222 97.2222 96.0069

95.8333 94.4444 97.2222 95.8333 94.4444 95.8333 97.2222 97.2222 96.0069

Modified Bhattachar Frequency

Fold 1 2 3 4 5 Mean

98.6111 100 97.2222 98.6111 97.2222 100 86.1111 97.2222 96.875

93.0556 94.4444 97.2222 94.4444 94.4444 94.4444 94.4444 88.8889 93.9236 2

95.8333 94.4444 95.8333 93.0556 93.0556 93.0556 91.6667 88.8889 93.2292 3

93.0556 93.0556 97.2222 91.6667 93.0556 95.8333 90.2778 93.0556 93.4028 4

94.4444 91.6667 95.8333 94.4444 91.6667 94.4444 93.0556 93.0556 93.5764 5

94.4444 91.6667 95.8333 94.4444 93.0556 95.8333 93.0556 93.0556 93.9236 6

94.4444 91.6667 95.8333 94.4444 93.0556 95.8333 91.6667 93.0556 93.75 7

95.8333 91.6667 95.8333 94.4444 93.0556 97.2222 94.4444 93.0556 94.4444 8

95.8333 93.0556 95.8333 94.4444 94.4444 97.2222 94.4444 95.8333 95.1389 9

95.8333 93.0556 95.8333 95.8333 95.8333 97.2222 95.8333 95.8333 95.6597 10

95.8333 91.6667 95.8333 94.4444 95.8333 97.2222 95.8333 95.8333 95.3125 11

97.2222 93.0556 95.8333 94.4444 95.8333 97.2222 95.8333 95.8333 95.6597 12

97.2222 93.0556 95.8333 95.8333 95.8333 97.2222 97.2222 95.8333 96.0069 13

97.2222 91.6667 95.8333 95.8333 95.8333 97.2222 97.2222 95.8333 95.8333 14

97.2222 94.4444 95.8333 95.8333 95.8333 97.2222 97.2222 97.2222 96.3542 15

97.2222 93.0556 95.8333 97.2222 95.8333 97.2222 97.2222 97.2222 96.3542 16

97.2222 94.4444 97.2222 97.2222 95.8333 97.2222 97.2222 97.2222 96.7014 17

97.2222 93.0556 97.2222 97.2222 95.8333 97.2222 97.2222 97.2222 96.5278 18

95.8333 94.4444 97.2222 97.2222 95.8333 95.8333 97.2222 97.2222 96.3542 19

95.8333 94.4444 97.2222 97.2222 95.8333 95.8333 97.2222 97.2222 96.3542 20

95.8333 93.0556 97.2222 97.2222 95.8333 95.8333 97.2222 97.2222 96.1806 21

95.8333 94.4444 97.2222 95.8333 95.8333 95.8333 97.2222 97.2222 96.1806 22

Modified Bhattachar No ensembel

Fold 1 2 3 4 5 Mean

98.6111 100 97.2222 98.6111 97.2222 100 86.1111 97.2222 96.875

93.0556 94.4444 97.2222 94.4444 94.4444 94.4444 94.4444 88.8889 93.9236

95.8333 93.0556 93.0556 91.6667 88.8889 93.0556 87.5 90.2778 91.6667

90.2778 90.2778 94.4444 90.2778 90.2778 93.0556 88.8889 90.2778 90.9722

86.1111 90.2778 94.4444 93.0556 90.2778 88.8889 90.2778 91.6667 90.625

91.6667 90.2778 93.0556 88.8889 91.6667 94.4444 87.5 87.5 90.625

88.8889 90.2778 95.8333 93.0556 91.6667 97.2222 88.8889 88.8889 91.8403

94.4444 88.8889 91.6667 83.3333 94.4444 91.6667 90.2778 90.2778 90.625

93.0556 86.1111 90.2778 91.6667 94.4444 88.8889 87.5 87.5 89.9306

97.2222 84.7222 95.8333 88.8889 93.0556 94.4444 97.2222 97.2222 93.5764

88.8889 88.8889 93.0556 95.8333 95.8333 95.8333 86.1111 86.1111 91.3194

94.4444 86.1111 95.8333 94.4444 93.0556 91.6667 90.2778 90.2778 92.0139

90.2778 81.9444 93.0556 93.0556 93.0556 88.8889 90.2778 90.2778 90.1042

91.6667 88.8889 93.0556 90.2778 88.8889 90.2778 94.4444 94.4444 91.4931

90.2778 93.0556 90.2778 91.6667 91.6667 91.6667 90.2778 90.2778 91.1458

90.2778 88.8889 88.8889 90.2778 93.0556 93.0556 94.4444 94.4444 91.6667

93.0556 87.5 93.0556 94.4444 91.6667 91.6667 93.0556 93.0556 92.1875

90.2778 88.8889 94.4444 94.4444 91.6667 90.2778 93.0556 93.0556 92.0139

90.2778 86.1111 95.8333 93.0556 91.6667 93.0556 90.2778 90.2778 91.3194

91.6667 88.8889 90.2778 90.2778 93.0556 91.6667 94.4444 94.4444 91.8403

91.6667 91.6667 97.2222 93.0556 95.8333 93.0556 86.1111 86.1111 91.8403

90.2778 93.0556 94.4444 90.2778 93.0556 90.2778 88.8889 88.8889 91.1458

86

88

90

92

94

96

98

100

0 5 10 15 20 25

Percentage

 Accuracy

Grassmannian Manifold Dimensionality

Sum Fusion

Mode Fusion

No Fusion

86

88

90

92

94

96

98

100

0 5 10 15 20 25

Percentage

 Accuracy

Grassmannian Manifold Dimensionality

Sum Fusion

Mode Fusion

No Fusion

(b)(a)

Figure 5. Mobo dataset: Comparison of (a) hierarchical with (b)one-step spectral clustering.

(svds) in Fig. 4c. We observe more accuracy due to betterquality embedding of Grassmannian Manifolds in the Eu-clidean space by the proposed solver. In terms of execu-tion time, our eigenvector solver is around 4 times fasterthan SVDS (Fig. 4d). This demonstrates the efficacy ofour eigenvector solver for the purpose of spectral cluster-ing. The maximum accuracy of our algorithm is 83% andthe average accuracy is 76.4±5.7% using BMi,p (6) distancemeasure (1). To the best of our knowledge, this is the high-est accuracy to date reported on this dataset.

Table 1. Average recognition rates over 10-folds on Honda, MoBo,& ETH80, 5-fold on YouTube and 1-fold on Cambrage dataset.

Honda MoBo ETH80 You-Tube Cambr.DCC 94.7±1.3 93.6±1.8 90.9±5.3 53.9±4.7 65.0MMD 94.9±1.2 93.2±1.7 85.7±8.3 54.0±3.7 58.1MDA 97.4±0.9 97.1±1.0 80.50±6.81 55.1 ±4.5 20.9

AHISD † 89.7±1.9 97.4±0.8 74.76±3.3 60.7±5.2 18.1CHISD † 92.3±2.1 96.4±1.0 71.0±3.9 60.4±5.9 18.3SANP 93.1±3.4 96.9±0.6 72.4±5.0 65.0±5.53 22.5CDL 100±0.0 95.8±2.0 89.2±6.8 ∗62.2±5.1 73.4Prop. 100±0.0 98.0±0.9 91.5±3.8 76.4±5.7 83.05∗ CDL results are on different folds therefore, the accuracy

is less than that reported by [27]. †The accuracy of AHISD andCHISD is less than in [8] due to smaller image sizes.

The second dataset is CMU Mobo [23] containing 96videos of 24 subjects. Face images were re-sized to 40 ×40 and LBP features were computed using circular (8, 1)neighborhoods extracted from 8 × 8 gray scale patches [8].We performed 10-fold experiments by randomly selectingone image-set per subject as training and the remaining 3as probe. We achieved a maximum accuracy of 100% andaverage 98.0±0.93% (Table 1) which is the highest reportedso far.

Our final dataset is Honda/UCSD [13] containing 59videos of 20 subjects with varying poses and expressions.Histogram equalized 20×20 gray scale face image pixelvalues were used as features [28]. We performed 10-foldexperiments by randomly selecting one set per subject asgallery and the remaining 39 as probes. An ensemble of 15spectral classifiers obtained 100% accuracy on all folds.

6.2. Object Categorization & Gesture RecognitionFor object categorization, we use the ETH-80 dataset

containing images of 8 object categories each with 10 dif-ferent objects. Each object has 41 images taken at differ-ent views forming an image-set. We use 20×20 intensityimages for classifying an image-set of an object into a cate-gory. ETH-80 is a challenging database because it has fewerimages per set and significant appearance variations acrossobjects of the same class. For each class, 5 random imagesets are used for training and the remaining 5 for testing. Weachieved an average recognition rate of 91.5±3.8% using anensemble of 15 spectral classifiers.

The Cambridge Hand Gesture dataset [15] contains 900image-sets of 9 gesture classes with large intra-class varia-tions. Each class has 100 image sets, divided into two parts,81-100 used as gallery and 1-80 as probes [14]. Pixel val-ues of 20×20 gray scale images are used as feature vectors.Using an ensemble of 9 spectral classifiers, we obtained anaccuracy of 83.05% which is higher than the other algo-rithms.

6.3. Robustness to OutliersWe performed robustness experiments in a setting sim-

ilar to [8]. Honda dataset was modified to have 100 ran-domly selected images per set. In the first experiment, each

Page 8: Semi-supervised Spectral Clustering for Image Set ... - cv …€¦ · Semi-supervised Spectral Clustering for Image Set Classification Arif Mahmood, Ajmal Mian, Robyn Owens School

N_G only first fold N_P only first fold

n_r CDL n_r CDL

19 97.4 1 97.43

38 97.4 2 94.87

57 94.8 3 94.87

n_r SANP n_r SANP

1 89.74 1 87.17

2 94.87 2 76.92

3 89.74 3 74.35

n_r CHISD n_r CHISD

1 1

2 2

3 3

n_r Proposed n_r Proposed

19 100 1 100

38 100 2 100

57 100 3 97.43

84

86

88

90

92

94

96

98

100

19 38 57

% Accuracy

% Outliers(a)

70

75

80

85

90

95

100

19 38 57

% Outliers

CDL

SANP

Proposed

(b)

Figure 6. Robustness to outliers experiment: (a) Corrupted gallerycase (b) Corrupted probe case.

gallery set was corrupted by adding 1 to 3 random imagesfrom each other gallery set resulting in 19%, 38% and 57%outliers. Our algorithm achieved 100% accuracy for allthree cases. In the second experiment, the probe set wascorrupted by adding 1 to 3 random images from each galleryset. In this case, our algorithm achieved {100%, 100%,97.43%} recognition rates. Our algorithm outperformed allothers. Fig. 6 compares our algorithm to the nearest 2 com-petitors in both experiments i.e. CDL [27], SANP [30].

7. ConclusionWe presented an iterative sparse spectral clustering al-

gorithm for robust image-set classification. Each image-setis represented with Grassmannian manifolds of increasingdimensionality to facilitate the use of an ensemble of spec-tral classifiers. An important contribution is a fast eigenvec-tor solver which makes spectral clustering more efficient ingeneral. Instead of eigenvalue error, we minimize the signchanges in our group power iteration algorithm which pro-vides significant speedup and better quality clusters.

8. AcknowledgementsThis research was supported by ARC grants DP1096801

and DP110102399. We thank T. Kim and R. Wang forthe codes of DCC and MMD and the cropped faces ofHonda data. We thank H. Cevikalp for the LBP featuresof Mobo data and Y. Hu for the MDA implementation.

References[1] A. Mahmood and A. Mian. Hierarchical sparse spectral clus-

tering for image set classification. In BMVC, 2012. 1[2] S. Chen, C. Sanderson, M. T. Harandi, and B. C. Lovell. Im-

proved image set classification via joint sparse approximatednearest subspaces. In CVPR, 2013. 1, 2

[3] M. Yang, P. Zhu, L. Gool, and L. Zhang. Face recognitionbased on regularized nearest points between image sets. InFG, 2013. 1

[4] H. Cheng, Z. Liu, and L. Yang. Sparsity induced similaritymeasure for label propagation. In ICCV, 2009. 4

[5] Z. Cui, S. Shan, H. Zhang, S. Lao, X. Chen. Image sets align-ment for video based face recognition. CVPR, 2012. 1

[6] E. Elhamifar and R. Vidal. Sparse subspace clustering: Al-gorithm, theory, and applications. TPAMI, 2013. 2, 4

[7] C. Fowlkes, S. Belongie, F. Chung, J. Malik. Spectral group-ing using the Nystrom method. TPAMI, 2004. 4, 5

[8] H. Cevikalp and B. Triggs. Face recognition based on imagesets. In CVPR, pages 2567–2573, 2010. 1, 6, 7

[9] J. Hamm and D. Lee. Grassmann discriminant analysis: aunifying view on subspace learning. In ICML, 2008. 5

[10] M. Harandi, C. Sanderson, S. Shirazi, B. Lovell. Graph em-bedding discriminant analysis on Grassmannian manifoldsfor improved image set matching CVPR, 2011. 1, 5

[11] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry and Y. Ma.Robust face recognition via sparse representation. TPAMI,31(2):210–227, 2009. 4

[12] Y. Chen, V. Patel, P. Phillips, and R. Chellappa. Dictionary-based face recognition from video. In ECCV. 2012. 1

[13] K. Lee, J. Ho, M. Yang and D. Kriegman. Video basedface recognition using probabilistic appearance manifolds.In CVPR, 2003. 7

[14] T. Kim and R. Cipolla. Canonical correlation analysis ofvideo volume tensors for action categorization and detection.TPAMI, 31(8), 2009. 7

[15] T. Kim, K. Wong, and R. Cipolla. Tensor canonical correla-tion analysis for action classification. In CVPR, 2007. 7

[16] H. Lee, A. Battle, R. Raina, and A. Ng. Efficient sparsecoding algorithms. In NIPS, 2007. 5

[17] H Li, G Hua, Z Lin, and J Yang Probabilistic elastic match-ing for pose variant face verification. In CVPR, 2013. 1

[18] M. Li, X. Lian, J. K., B. Lu. Time and space efficient spectralclustering via column sampling. In CVPR, 2011. 5

[19] M. Kim, S. Kumar, V. Pavlovic and H. Rowley. Face trackingand recognition with visual constraints in real world videos.In CVPR, 2008. 6

[20] M. Nishiyama, M. Yuasa, T. Shibata, T. Wakasugi, T. Kawa-hara and Yamaguchi. Recognizing faces of moving peopleby hierarchical image-Set matching. In CVPR, 2007. 1

[21] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online dictionarylearning for sparse coding. In ICML, 2009. 6

[22] X. Peng, L. Zhang, and Z. Yi. Scalable sparse subspace clus-tering. CVPR, 2013. 5

[23] R. Gross and J. Shi. The CMU Motion of Body (MoBo)Database. Technical Report CMU-RI-TR-01-18, 2001. 7

[24] R. Wang and X. Chen. Manifold discriminant analysis. InCVPR, 2009. 1, 3, 6

[25] J. Shi and J. Malik. Normalized cuts and image segmenta-tion. TPAMI, 2000. 4, 5

[26] T. Kim, O. Arandjelovic and R. Cipolla. Discriminativelearning and recognition of image set classes using canon-ical correlations. TPAMI, 2007. 1, 5, 6

[27] R. Wang, H. Guo, L. S. Davis, and Q. Dai. Covariance dis-criminative learning: A natural and efficient approach to im-age set classification. In CVPR, 2012. 6, 7, 8

[28] R. Wang, S. Shan, X. Chen, Q. Dai, and W. Gao. Manifold-manifold distance and its application to face recognition withimage sets. TIP, 2012. 1, 3, 5, 6, 7

[29] B Wu, Y Zh, B Hu, Q Ji Constrained clustering and its ap-plication to face clustering in videos. In CVPR, 2013. 1

[30] Yiqun Hu and Ajmal S. Mian and Robyn Owens. Face recog-nition using sparse approximated nearest points between im-age sets. TPAMI, 2012. 1, 6, 8

[31] G. H. Golub and C. V. Loan. Matrix Computations. The JohnHopkins University Press, 1996. 6