12
1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47, NO. 4, APRIL 2017 Cross-Domain Recognition by Identifying Joint Subspaces of Source Domain and Target Domain Yuewei Lin, Student Member, IEEE, Jing Chen, Yu Cao, Member, IEEE, Youjie Zhou, Lingfeng Zhang, Student Member, IEEE, Yuan Yan Tang, Fellow, IEEE, and Song Wang, Senior Member, IEEE Abstract—This paper introduces a new method to solve the cross-domain recognition problem. Different from the traditional domain adaption methods which rely on a global domain shift for all classes between the source and target domains, the proposed method is more flexible to capture individual class variations across domains. By adopting a natural and widely used assump- tion that the data samples from the same class should lay on an intrinsic low-dimensional subspace, even if they come from different domains, the proposed method circumvents the lim- itation of the global domain shift, and solves the cross-domain recognition by finding the joint subspaces of the source and target domains. Specifically, given labeled samples in the source domain, we construct a subspace for each of the classes. Then we con- struct subspaces in the target domain, called anchor subspaces, by collecting unlabeled samples that are close to each other and are highly likely to belong to the same class. The corresponding class label is then assigned by minimizing a cost function which reflects the overlap and topological structure consistency between subspaces across the source and target domains, and within the anchor subspaces, respectively. We further combine the anchor subspaces to the corresponding source subspaces to construct the joint subspaces. Subsequently, one-versus-rest support vector machine classifiers are trained using the data samples belonging Manuscript received December 13, 2015; revised February 20, 2016; accepted March 1, 2016. Date of publication March 21, 2016; date of current version March 15, 2017. This work was supported in part by the Air Force Office of Scientific Research under Grant FA9550-11-1-0327, in part by the National Science Foundation under Grant IIS-1017199, in part by the U.S. Army Research Laboratory under Grant W911NF-10-2-0060 (U.S. Defense Advanced Research Projects Agency Minds Eye Program), in part by the Research Grants of University of Macau under Grant MYRG2015-00049-FST, Grant MYRG2015-00050-FST, and Grant RDG009/FST-TYY/2012, in part by the Science and Technology Development Fund of Macau under Grant 100-2012-A3 and Grant 026-2013-A, in part by Macau-China join Project under Grant 008-2014-AMJ, and in part by the National Natural Science Foundation of China under Grant 61273244. A preliminary version of this paper has appeared in [1]. This paper was recommended by Associate Editor D. Tao. Y. Lin, Y. Zhou, and S. Wang are with the Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208 USA (e-mail: [email protected]; [email protected]; [email protected]). J. Chen is with Chongqing University, Chongqing 400030, China (e-mail: [email protected]). Y. Cao is with IBM Almaden Research Center, San Jose, CA 95120 USA (e-mail: [email protected]). He participated this work while he was doing Ph.D. degree in the University of South Carolina, Columbia, SC, USA. His contact information only refers to affiliation rather than any intellectual property. L. Zhang is with the University of Houston, Houston, TX 77004 USA (e-mail: [email protected]). Y. Y. Tang is with the University of Macau, Macau, China (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCYB.2016.2538199 to the same joint subspaces and applied to unlabeled data in the target domain. We evaluate the proposed method on two widely used datasets: 1) object recognition dataset for computer vision tasks and 2) sentiment classification dataset for natural language processing tasks. Comparison results demonstrate that the proposed method outperforms the comparison methods on both datasets. Index Terms—Cross domain recognition, joint subspace, unsupervised. I. I NTRODUCTION M ANY machine learning methods often assume that the training data (labeled) and testing data (unlabeled) are from the same feature space and follow similar distributions. However, this assumption may not be true in many real applications. Namely, the training data is obtained from one domain, while the testing data come from a different domain. As a visual example, Fig. 1 shows coffee-mug images col- lected from four different domains [Amazon, Caltech, digital single-lens reflex (DSLR), and Webcam], which present dif- ferent image resolutions (Webcam versus DSLR), viewpoints (Webcam versus Amazon), background complexities (Amazon versus Caltech), and object layout patterns, etc. On the other hand, the data samples also show different dis- tributions in the feature space, as illustrated in Fig. 2. The 2-D plots are the first two feature dimensions reduced from orig- inal 800-D speeded up robust features (SURF) feature space (described in Section IV-D), using principal component anal- ysis (PCA). The first and the second rows of Fig. 2 show the distributions of “monitor” and “projector” and “mouse” and “mug” in different domains, respectively. It is clear to see that the data samples from different domains have different distributions. Moreover, the relations between two classes in different domains are also different. Taking the second row of Fig. 2 as an example, in Webcam domain (the last column), the mug samples (green circles) are usually located at the top-right side of the mouse ones (red circles); but in Amazon domain (the first column), the mug samples (green circles) are usually located at the left side of the mouse ones (red circles). These domain differences lead to a dilemma that: 1) directly applying the classifiers trained from one domain to another may result in significant degraded performance [2] and 2) labeling data in each domain as training samples would be very expensive, especially in large-scale applications. The dilemma consequently poses the cross-domain recognition problem, namely how to utilize the labeled data in a source 2168-2267 c 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47 ...songwang/document/cyber17.pdf1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47, NO. 4, APRIL 2017 Cross-Domain Recognition by Identifying

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47 ...songwang/document/cyber17.pdf1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47, NO. 4, APRIL 2017 Cross-Domain Recognition by Identifying

1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47, NO. 4, APRIL 2017

Cross-Domain Recognition by Identifying JointSubspaces of Source Domain and Target Domain

Yuewei Lin, Student Member, IEEE, Jing Chen, Yu Cao, Member, IEEE, Youjie Zhou,Lingfeng Zhang, Student Member, IEEE, Yuan Yan Tang, Fellow, IEEE,

and Song Wang, Senior Member, IEEE

Abstract—This paper introduces a new method to solve thecross-domain recognition problem. Different from the traditionaldomain adaption methods which rely on a global domain shift forall classes between the source and target domains, the proposedmethod is more flexible to capture individual class variationsacross domains. By adopting a natural and widely used assump-tion that the data samples from the same class should lay onan intrinsic low-dimensional subspace, even if they come fromdifferent domains, the proposed method circumvents the lim-itation of the global domain shift, and solves the cross-domainrecognition by finding the joint subspaces of the source and targetdomains. Specifically, given labeled samples in the source domain,we construct a subspace for each of the classes. Then we con-struct subspaces in the target domain, called anchor subspaces,by collecting unlabeled samples that are close to each other andare highly likely to belong to the same class. The correspondingclass label is then assigned by minimizing a cost function whichreflects the overlap and topological structure consistency betweensubspaces across the source and target domains, and within theanchor subspaces, respectively. We further combine the anchorsubspaces to the corresponding source subspaces to constructthe joint subspaces. Subsequently, one-versus-rest support vectormachine classifiers are trained using the data samples belonging

Manuscript received December 13, 2015; revised February 20, 2016;accepted March 1, 2016. Date of publication March 21, 2016; date of currentversion March 15, 2017. This work was supported in part by the Air ForceOffice of Scientific Research under Grant FA9550-11-1-0327, in part by theNational Science Foundation under Grant IIS-1017199, in part by the U.S.Army Research Laboratory under Grant W911NF-10-2-0060 (U.S. DefenseAdvanced Research Projects Agency Minds Eye Program), in part by theResearch Grants of University of Macau under Grant MYRG2015-00049-FST,Grant MYRG2015-00050-FST, and Grant RDG009/FST-TYY/2012, in partby the Science and Technology Development Fund of Macau under Grant100-2012-A3 and Grant 026-2013-A, in part by Macau-China join Projectunder Grant 008-2014-AMJ, and in part by the National Natural ScienceFoundation of China under Grant 61273244. A preliminary version of thispaper has appeared in [1]. This paper was recommended by AssociateEditor D. Tao.

Y. Lin, Y. Zhou, and S. Wang are with the Department of ComputerScience and Engineering, University of South Carolina, Columbia,SC 29208 USA (e-mail: [email protected]; [email protected];[email protected]).

J. Chen is with Chongqing University, Chongqing 400030, China (e-mail:[email protected]).

Y. Cao is with IBM Almaden Research Center, San Jose, CA 95120USA (e-mail: [email protected]). He participated this work while he wasdoing Ph.D. degree in the University of South Carolina, Columbia, SC, USA.His contact information only refers to affiliation rather than any intellectualproperty.

L. Zhang is with the University of Houston, Houston, TX 77004 USA(e-mail: [email protected]).

Y. Y. Tang is with the University of Macau, Macau, China (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCYB.2016.2538199

to the same joint subspaces and applied to unlabeled data inthe target domain. We evaluate the proposed method on twowidely used datasets: 1) object recognition dataset for computervision tasks and 2) sentiment classification dataset for naturallanguage processing tasks. Comparison results demonstrate thatthe proposed method outperforms the comparison methods onboth datasets.

Index Terms—Cross domain recognition, joint subspace,unsupervised.

I. INTRODUCTION

MANY machine learning methods often assume that thetraining data (labeled) and testing data (unlabeled) are

from the same feature space and follow similar distributions.However, this assumption may not be true in many realapplications. Namely, the training data is obtained from onedomain, while the testing data come from a different domain.As a visual example, Fig. 1 shows coffee-mug images col-lected from four different domains [Amazon, Caltech, digitalsingle-lens reflex (DSLR), and Webcam], which present dif-ferent image resolutions (Webcam versus DSLR), viewpoints(Webcam versus Amazon), background complexities (Amazonversus Caltech), and object layout patterns, etc.

On the other hand, the data samples also show different dis-tributions in the feature space, as illustrated in Fig. 2. The 2-Dplots are the first two feature dimensions reduced from orig-inal 800-D speeded up robust features (SURF) feature space(described in Section IV-D), using principal component anal-ysis (PCA). The first and the second rows of Fig. 2 showthe distributions of “monitor” and “projector” and “mouse”and “mug” in different domains, respectively. It is clear to seethat the data samples from different domains have differentdistributions. Moreover, the relations between two classes indifferent domains are also different. Taking the second row ofFig. 2 as an example, in Webcam domain (the last column), themug samples (green circles) are usually located at the top-rightside of the mouse ones (red circles); but in Amazon domain(the first column), the mug samples (green circles) are usuallylocated at the left side of the mouse ones (red circles).

These domain differences lead to a dilemma that: 1) directlyapplying the classifiers trained from one domain to anothermay result in significant degraded performance [2] and2) labeling data in each domain as training samples wouldbe very expensive, especially in large-scale applications. Thedilemma consequently poses the cross-domain recognitionproblem, namely how to utilize the labeled data in a source

2168-2267 c© 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: 1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47 ...songwang/document/cyber17.pdf1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47, NO. 4, APRIL 2017 Cross-Domain Recognition by Identifying

LIN et al.: CROSS-DOMAIN RECOGNITION BY IDENTIFYING JOINT SUBSPACES OF SOURCE DOMAIN AND TARGET DOMAIN 1091

Fig. 1. Sample images from four different domains: Amazon, Caltech, DSLR,and Webcam.

domain to classify/recognize the unlabeled data in a targetdomain.

To achieve cross-domain recognition, a number of domainadaption (DA) methods have been developed to adapt the clas-sifier from one domain to another, which can be consideredas a specific type of transfer learning [3]–[7]. The subspace-based DA methods have been found to be very effectiveto handle cross-domain problem [8]–[14]. They either con-structed a set of intermediate subspaces for modeling the shiftsbetween domains [8], [9], [11], [13], or generated a domain-invariant subspace in which the data from the source and targetdomains can represent each other well [10], [12], [14]. Allthese methods mentioned above utilize the data from a domainall together to generate a single subspace for the domain. Inpractice, however, the intrinsic feature shift of each class maynot be exactly the same. The existing methods can obtain aglobal domain shift, but ignore the individual class differenceacross domains.

To circumvent the limitation of the global domain shift,we adopt a natural and widely used assumption that thedata samples from the same class should lay on an intrinsiclow-dimensional subspace, even if they come from differentdomains [15]. This assumption is not only held on many com-puter vision tasks, such as face recognition under varyingillumination [16] and handwritten digit recognition [17] butalso used as a human cognitive mechanism for visual objectrecognition [18]. Note that this assumption does not meanthat the target data samples exactly lay on the intrinsic low-dimensional subspace of the source samples, since differentdomains show subspace shift [11]. Fig. 3 gives an toy illus-tration of a joint subspace covering the source domain andtarget domain for a specific class. The source and target sub-spaces have the overlap bases which implicitly represents theintrinsic characteristics of the considered class. They have theirown exclusive bases because of the domain shift, such as thevarying illumination or the change of the view perspective.

We also give an example to show the bases of subspacesusing the sparse-subspace representation [15] of the data.

Given N data samples X = {xi}Ni=1, each data sample can be

reconstructed by a sparse linear combination of other samplesbased on the self-expressiveness property of the data. That is

min ‖C‖0 s.t. X = XC, diag(C) = 0 (1)

where cij is the coefficient of the sparse representation forreconstructing the xi by xj. Therefore, the average value of ithrow of C, i.e., Ri = (1/N)

∑j ‖cij‖, denotes the importance of

the ith data sample used for reconstructing other data samples.We treat the ith data sample as one basis of the intrinsic low-dimensional subspace of X, if Ri ≥ ε.

We consider the object “backpack” from Amazon datasetas the source domain S (92 data samples) and the sameobject from Caltech as the target domain T (151 data sam-ples). Using the sparse-subspace representation of the uniondataset J = S ∪ T , we can obtain the bases of the jointsubspace. We treat the ith data sample in J as the overlapbases if they are important for reconstructing the data sam-ples both from S and T , i.e., (1/92)

∑92j=1 ‖cij‖ ≥ ε and

(1/151)∑243

j=93 ‖cij‖ ≥ ε, since the ith data sample belongsto both domains. Fig. 4(a) and (b) shows the overlap bases ofthe joint subspace in S and T , respectively. Fig. 4(c) and (d)shows the bases of the source and target intrinsic subspaces,respectively. The red bases are the same as the overlap ones,therefore, the blue ones are the exclusive bases for their ownsubspaces.

Based on the above assumption, we propose a new methodthat solves the cross-domain recognition by finding the jointsubspaces of the source and target domains. Specifically, givenlabeled samples in the source domain, we construct an intrinsicsubspace for each of the classes. Then we construct subspacesin the target domain, called anchor subspaces, by collectingunlabeled samples that are close to each other and high-likelybelong to the same class. The corresponding class label isthen assigned by minimizing a cost function which reflectsthe overlap and topological structure consistency between sub-spaces across the source and target domains, and within theanchor subspaces, respectively. We further combine the anchorsubspaces to corresponding source subspaces to construct thejoint subspaces for each class. Subsequently, an support vec-tor machine (SVM) classifier is trained by using the samplesin the joint subspace and applied to the unlabeled data in thetarget domain for classification.

The contributions of this paper are as follows.1) By assuming that the data samples from one specific

class, even though they come from different domains,should lay on an intrinsic low-dimensional subspace, wegenerate one joint subspace for each class independently.Each joint subspace carries the information not onlyabout the intrinsic characteristics of the correspondingclass, but also about the specificity for each domain.

2) To construct the joint subspaces, we first generateanchor subspaces in the target domain, assign labelsto them, and combine these anchor subspaces to thecorresponding source subspaces.

3) We propose a cost function that implicitly maximizesthe overlap between the source subspace and the tar-get subspace for each class as well as maintaining the

Page 3: 1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47 ...songwang/document/cyber17.pdf1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47, NO. 4, APRIL 2017 Cross-Domain Recognition by Identifying

1092 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47, NO. 4, APRIL 2017

Fig. 2. Illustrations of sample distributions of different domains in feature space. The 2-D plots are the first two feature dimensions reduced from original800-D SURF feature space (described in Section IV-D), using PCA. The first row are the distributions of monitor and projector in four domains. The secondrow are the distributions of mouse and mug in four domains. Best viewed in color.

Fig. 3. Illustration of a joint subspace between the source and target domainsfor a specific class. This joint subspace consists of overlap bases betweendomains, which represent the intrinsic characteristics of this class implic-itly, and exclusive bases of different domains, which represent the exclusivecharacteristics for each domain.

(a)

(c)

(b)

(d)

Fig. 4. (a) and (b) Overlap bases of the joint subspace in the source andtarget domains, respectively. (c) and (d) Bases of the source and target domain,respectively. The red bases are the same as the overlap bases, while the bluebases are the exclusive bases of their own subspaces. Best viewed in color.

topological structure in the target domain. We use theprincipal angles as the subspace distances in the costfunction instead of the data-sample distances that wereusually used in previous methods.

Note that, in this paper, when we say the subspace of a set ofdata samples, we do not explicitly get the bases for its intrinsiclow-dimensional subspace. Instead, we use the original data

samples to represent the intrinsic subspace implicitly. Thereare two reasons that we directly use the original data.

1) There is no need to have the bases since we do notproject the original data to a new space. We only implic-itly use the bases of the subspace when we calculate thedistance between subspaces in Section IV-B. Using theoriginal data avoids the additional computational cost,such as calculating the bases and projection.

2) All the data samples used either for training or test-ing should be in the same space. Therefore, we do notproject them into different subspaces.

The remainder of this paper is organized as follows. Weelaborate related works in Section II. Section III describesthe proposed method. Quantitative experimental results aredemonstrated in Section IV. Section V concludes this paper.

II. RELATED WORK

For the cross-domain recognition problem, domain adapta-tion is the most closely related work which is known as atype of fundamental methods in machine learning and com-puter vision. Here, we give a brief review of this topic. Pleaserefer to [19] for a comprehensive survey.

The traditional DA algorithms can be categorized into twotypes, i.e., (semi-)supervised domain adaptation and unsuper-vised domain adaptation, based on the availability of labeleddata from the target domain. Both semisupervised and super-vised DAs assume that the labeled data are available in thetarget domain. There are relatively large number of labeleddata in the target domain for the supervised DA. For thesemisupervised DA, there are only a very few number oflabeled data available, motivated by the standard semisuper-vised learning [20]–[24]. Daumé [25] proposed to map the datafrom both source and target domains to a high-dimensionalfeature space, and trained classifiers in this new feature space.Saenko et al. [26] proposed a metric learning approach that canadapt labeled data of few classes from the target domain to the

Page 4: 1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47 ...songwang/document/cyber17.pdf1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47, NO. 4, APRIL 2017 Cross-Domain Recognition by Identifying

LIN et al.: CROSS-DOMAIN RECOGNITION BY IDENTIFYING JOINT SUBSPACES OF SOURCE DOMAIN AND TARGET DOMAIN 1093

unlabeled target classes. Kumar et al. [27] proposed a coreg-ularization model that augments the feature space to jointlymodel the source and target domains. Chen et al. [28] proposeda cotraining-based domain adaptation method. They first trainan initial category model on samples from the source domainand then use it for labeling samples from the target domain.The category model is then updating using the newly labeledtarget samples through cotraining. Pan et al. [29] analyzed thetransfer component that maps both domains on kernel space topreserve some properties of domain-specific data distributions.Duan et al. [30] proposed an SVM-based method which min-imized the mismatches between source and target domains,using both labeled and unlabeled data. Shekhar et al. [12]proposed to learn a single dictionary to represent both sourceand target domains. The work in [31] proposed to use a lin-ear transformation to map features from the target domainto the source domain and generate the classification modeltrained on the source domain and target samples based on fea-ture transformation. Motivated by the recent success of deeplearning, some hierarchical domain adaptation methods arealso proposed [32], which need large scale data to (pre-)trainthe deep neural network model. Xiao and Guo [33] proposeda semisupervised kernel matching-based domain adaptationmethod that learns a prediction function on the source domainwhile mapping the target samples to similar source samples bymatching the target kernel matrix to the source kernel matrix.

Unsupervised DA, on the other hand, does not use any labelinformation in the target domain, which is also considered asmore challenging and more useful in real-world applications.Gopalan et al. [8], [34] constructed a set of intermediate sub-spaces along the geodesic path that links the source and targetdomains on the Grassmann manifold. Gong et al. [9] pro-posed a geodesic flow kernel (GFK) to model shift betweenthe source and target domains. In [11], a new method wasproposed to construct the subspaces by gradually reducing thereconstruction error of the target data instead of using themanifold walking strategies. Rather than using the geodesicpath in [8] and [34], Caseiro et al. [35] proposed to com-pute the spline curve along with the Grassmann manifold.Jhuo et al. [10] learned a transformation so that the sourcesamples can be represented by target samples in a low-rankway. Fernando et al. [36] proposed to learn a mapping func-tion which aligns the sample representations from the sourceand target domains. Tommasi and Caputo [37] proposed anaive Bayes nearest neighbor-based domain adaptation algo-rithm that iteratively learns a class metric while inducinga large margin separation among classes for each sample.Baktashmotlagh et al. [38] proposed to use the Riemannianmetric as a measure of distance between the distributionsof source and target domains. Long et al. [14] proposed tolearn a domain invariant representation by jointly performingthe feature matching and instance weighting. Cui et al. [13]treated samples from each domain as one sample (i.e., covari-ance matrices) on a Riemannian manifold, and then interpolatesome intermediate points along the geodesic path, which areused to bridge the two domains.

The algorithm proposed in this paper is an unsupervisedcross-domain recognition method. It is different from the

traditional DA methods by constructing an intrinsic low-dimensional joint subspace for each class independently. Thisavoids the global domain shift limitation and captures individ-ual domain variations for each class.

III. PROPOSED MODEL

A. Problem Setting

Suppose there are two sets of data samples, one from sourcedomain S , denoted as {xSi }NS

i=1 ∈ Rd×NS , and the other onefrom target domain T , denoted as {xTi }NT

i=1 ∈ Rd×NT , where dis the data dimension, NS and NT denote the number of datasamples in the source and target domains, respectively. Thelabels of all data samples in the source domain, denoted asYS = {ySi }NS

i=1 ∈ RC×NS , are known, where C is the numberof classes, ySi ∈ {0, 1}C is a C bit binary code of the ith datasample in source domain. If this data sample belongs to classj, the jth bit of ySi is 1 and all the other bits are 0. Our aim

is to estimate YT ∈ RC×NT , the labels of all the data samplesin the target domain.

B. Overview

The proposed algorithm aims to construct a set of joint sub-spaces {MSS

i }Ci=1, which cover the source and target domains,

one for each of the C classes, and then train the classifierson the data samples belonging to these joint subspaces. Asshown in Fig. 3, since a joint subspace is constituted by asource subspace and a target subspace, we need to constructthe source and target subspaces first. Hence, the proposedalgorithm consists of five steps.

1) Constructing Subspaces in Source Domain: We con-struct a set of subspaces {MS

i }Ci=1, one for each class in the

source domain. As mentioned above, we do not care about howto get a set of bases to represent a subspace. We implicitly con-struct the intrinsic low-dimensional subspace for each class byusing original data samples, i.e., take all the data that belongto this class. Hence, each source subspace MS

i = {xSj |xSj ∈Ci},

where Ci denotes the ith class, illustrated in Fig. 5(a).2) Constructing Anchor Subspaces in Target Domain: To

estimate the target subspace for each class, we construct anumber of anchor subspaces in the target domain, denoted as{MT

i }Ki=1, as illustrated in Fig. 5(b). These anchor subspaces

are expected to: 1) carry the information of target exclusivecharacteristics and 2) be compact. The data samples in the tar-get domain naturally satisfy the first expectation. To satisfy thesecond expectation, the data samples in one anchor subspaceshould be from the same class, since the subspace constructedby samples from different class is usually less compact thanthe one constructed by samples from the same class. Thus, thebasic idea is to ensure that each anchor subspace only containsdata samples from a single class such that it can be combinedto a source subspace for constructing the joint subspace com-pactly. Since the data in the target domain are unlabeled, weconstruct the anchor subspaces by grouping target data sam-ples with high similarities. This is motivated by the localityprinciple—a data sample usually lies in close proximity to asmall number of samples from the same class [39].

Page 5: 1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47 ...songwang/document/cyber17.pdf1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47, NO. 4, APRIL 2017 Cross-Domain Recognition by Identifying

1094 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47, NO. 4, APRIL 2017

Fig. 5. Overview of the proposed model. (a) Subspaces for each class insource domain. The bars with the same color denote the data samples ofone class. (b) Anchor subspaces construction. The points in target domainare the data samples. Data samples in each circle denote a core subgroupand they construct an anchor subspace as one row of black ellipses. (c) Jointsubspaces construction. The ellipses denote the data samples from anchorsubspaces (target domain). Best viewed in color.

3) Labeling the Anchor Subspaces: Since the joint sub-spaces are constructed independently for each class, we assigna label for each anchor subspace. For this purpose, we pro-pose a cost function that reflects: 1) the cross-domain distancebetween the anchor subspace and the corresponding sourcesubspaces and 2) the within-domain topological relation of theanchor subspaces in the target domain. Shorter cross-domaindistance actually reflects the desirability of more overlap basesin constructing a joint subspace. In other words, the mini-mization of the proposed cost function implicitly reflects themaximization of the overlap between the source and targetsubspaces.

4) Constructing Joint Subspaces: We construct joint sub-space MSS

i = {MSi , MT

j |MTj ∈Ci

}, where Ci denotes the ithclass, as illustrated in Fig. 5(c). As mentioned before, we sim-ply take all the original data samples from all the involvedsubspaces to implicitly construct the joint subspaces.

5) Training Classifiers on the Joint Subspaces: We trainone-versus-rest linear SVM classifiers for each class using thelabeled data samples in the joint subspace. Specifically, for aclass i, a linear SVM classifier is trained by using the labeleddata samples belonging to MSS

i , i.e., the source data sampleswith the label of i and the target data samples assigned tothe label i. Then we apply the linear SVM classifiers to theunlabeled data in the target domain for classification.

C. Anchor Subspaces Obtained in Target Domain

We construct each anchor subspace by selecting one targetdata sample and combining it with its nearest neighbors. Thisway, the obtained compact group of data samples are likelyto be from the same class [39]. Specifically, we first applythe K-means algorithm to cluster all the target data into alarge number of Z groups. We set Z = (NT/γ ), where γ

is the desired average group size. In each group of data, we

find a compact core subgroup consisting of a small numberof N samples, e.g., N = 5, which are taken for constructingan anchor subspace. In this paper, the core subgroup for thegroup L is constructed by the following two steps.

1) Estimate the center of the core subgroup by finding thedata sample x∗ to

minx∈L

x′∈N(N−1)(x)

∥∥x − x′∥∥

2 (2)

where N(N−1)(x) denotes the N −1 nearest neighbors ofx in L.

2) Take x∗∪N(N−1)(x∗) as the core subgroup for construct-ing an anchor subspace.

For the groups that contain less than N data samples, we donot construct an anchor subspace.

D. Labeling Each Anchor Subspace

Note that, we have constructed C subspaces in the sourcedomain, one for each class, denoted as {MS

i }Ci=1. Their corre-

sponding labels, denoted as Y = {yi}Ci=1 ∈ RC×C, which is an

identity matrix, i.e., the ith bit of yi is 1 and all the other bits ofyi are 0. In this section, we developed a new strategy to assignclass labels Y ′ = {y′

i}Ki=1 ∈ RC×K for the K anchor subspaces

{MTi }K

i=1 constructed in target domain. This strategy includestwo main components: 1) the similarity between subspacesand 2) the cost function for subspace label assignment.

1) Distance Between Subspaces: To calculate the dis-tance between two subspaces, principal angles are usuallyused [9], [40]. Principal angles between subspaces, which par-ticularly defined between two orthonormal subspaces, serve asa classic tool in many areas of in computer science, such ascomputer vision and machine learning.

We follow the definition in [40], given two orthonor-mal matrices M1 and M2, the principal angles 0 ≤ θ1 ≤· · · θm ≤ π/2 between two subspaces span(M1) and span(M2)

is defined by

cos θk = maxuk∈span(M1)

maxvk∈span(M2)

uTk vk

s.t. uTk uk = 1, vT

k vk = 1

uTk ui = 0, vT

k vi = 0, (i = 1, . . . , k − 1). (3)

The principal angles are related to the geodesic distancebetween M1 and M2 as

∑i θ

2i [40].

In practice, the principal angles are usually computedfrom the singular value decomposition (SVD) of MT

1 M2,i.e., MT

1 M2 = U(cos �)VT, where U = [u1, . . . , um],V = [v1, . . . , vn], and cos � is the diagonal matrixdiag(cos θ1 · · · cos θmin(m,n)).

In this paper, as shown in Section III-D2, we need bothcross-domain subspace distance and within-domain subspacedistance to define the cost function for anchor subspaces label-ing. For the cross-domain subspace distance, e.g., betweensubspace MS

i with si data samples in the source domain andMT

j with tj data samples in the target domain, we first orthogo-nalize both of them to obtain MS

i and MTj , and then calculate

Page 6: 1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47 ...songwang/document/cyber17.pdf1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47, NO. 4, APRIL 2017 Cross-Domain Recognition by Identifying

LIN et al.: CROSS-DOMAIN RECOGNITION BY IDENTIFYING JOINT SUBSPACES OF SOURCE DOMAIN AND TARGET DOMAIN 1095

the distance as

D(

MSi , MT

j

)�

min(si,tj)∑

i=1

sin θi (4)

where θi come from the SVD of (MSi )TMT

j that(MS

i )TMTj = U(cos �)VT .

Similarly, for the within-domain subspace distance, e.g.,between two target (anchor) subspaces MT

i and MTj that both

of them have N data samples, we first orthogonalize both ofthem to get MT

i and MTj , and then calculate the distance as

D(

MTi , MT

j

)�

N∑

i=1

sin θ ′i (5)

where θ ′i come from the SVD of (MT

i )TMTj that

(MTi )TMT

j = U(cos �)VT.The above defined subspace distance follows the assumption

that the samples within the same class share the same subspaceeven though they are from different domains. Consequently,the distance between subspaces across source and targetdomains of a specific class tends to be smaller than thatbetween different classes. We will show quantitative compar-ison in Section IV-B to demonstrate this advantage.

We further generate two affinity matrices, C × K matrixAST to reflect the distances between K anchor subspaces and Csource subspaces, and K×K matrix ATT to reflect the pairwisedistances among K anchor subspaces. More specifically, wehave AST(i, j) = exp(−(D(MS

i , MTj )/2σ 2)) and ATT(i, j) =

exp(−(D(MTi , MT

j )/2σ 2)).2) Cost Function and Optimization: Two important issues

are considered in assigning a label to each anchor subspace:1) the distance between an anchor subspace and the same-labelsource subspace should be small and 2) the local topologi-cal structures in the target domain should be preserved [41],i.e., anchor subspaces with shorter distance are more prefer-able to be assigned to the same class. Considering these twoissues, we propose the cost function as follows:

C(Y ′) =C∑

i=1

K∑

j=1

∥∥∥yi − y′

j

∥∥∥

2AST

ij + ρ

K∑

j=1

K∑

j′=1

∥∥∥yj − y′

j

∥∥∥

2ATT

jj′ (6)

where AST and ATT are the affinity matrices of interdo-main subspace pairs and subspace pairs within target domain,respectively.

Adding a constant term∑C

i=1∑C

j=1 ‖yi − yj‖2Iij into C andsplitting the first term into two parts, we get

C(Y ′) = 1

2

C∑

i=1

K∑

j=1

∥∥∥yi − y′

j

∥∥∥

2AST

ij + 1

2

C∑

j=1

K∑

i=1

∥∥yj − y′

i

∥∥2

ASTij

+ ρ

K∑

j=1

K∑

j′=1

∥∥∥y′

i − y′j

∥∥∥

2ATT

jj′ +C∑

i=1

C∑

j=1

∥∥yi − yj

∥∥2

Iij

(7)

where Iij is the ijth element of the identity matrix I.

Note that the first and second terms are equal. The costfunction C can be further written as

C(Y ′) =C+K∑

i=1

C+K∑

j=1

‖Yi − Yj‖2Aij, s.t. YT1 = 1 (8)

where Y = [Y, Y ′], A =[

I 12 AST

12

(AST

)TρATT

]

. We also relax

the constraint to this cost function by only requiring the sumof each column in Y to be 1.

By including the constraint term, the cost function can bewritten in a matrix form [42]

L(Y, λ) = Tr(YYT) + λT(

YT1 − 1) + μ

2

∥∥YT1 − 1

∥∥2

2 (9)

where = D−C is the Laplacian matrix of A. D is the degreematrix which is a diagonal matrix with Dii = ∑

j Aij. λ ∈RC+K is the Lagrange multiplier. To minimize the objectivefunction L, we separate it into two steps to update its twounknowns Y and λ alternately:

Step 1: Having λ fixed, optimize Y by computing thederivative of L with the respect to Y and setting it to be zero

∂L(Y, λ)

∂Y = 0 ⇒ Y + 1λT + μ(11TY − 11T) = 0. (10)

Note that Y contains two parts Y and Y ′, with Y is known.Thus, to solve it, as in [43], we first split the Laplacian matrix into four blocks along the Cth row and column

=[

CC CK

KC KK

]

.

Similarly, we separate λ into two parts

λ1 = [λ1, λ2, . . . , λC]T and λ2 = [λC+1, λC+2, . . . , λC+K]T.

Then Y ′ can be updated by solving the following equation:

Y ′(k+1)KK + μ11TY ′(k+1) = μ11T − YCK − 1λ(k)T2 . (11)

The solution is given by the Lyapunov equation [44]. Withthis solution, Y(k+1) can be achieved by putting Y and Y ′(k+1)

together as Y(k+1) = [Y, Y ′(k+1)].Step 2: Having Y fixed, perform a gradient ascending update

with the step of μ on Lagrange multipliers as

λ(k+1) = λ(k) + μ(Y(k+1)T

1 − 1). (12)

To initialize this optimization process, we simply set λ(0),Y ′(0) to be zero, and set the maximal number of iterationmaxIter to be 1000. This whole algorithm is summarized inAlgorithm 1.

Since we only require the sum of each column in Y to be 1,after we get Y ′, we set the bit with the maximal value in eachcolumn to 1 and all the other bits to 0.

IV. EXPERIMENTAL RESULTS

In this section, we first give the evaluation results ontwo components of the proposed algorithm, i.e., the sub-space distance we used and the anchor subspaces labeling.Then we evaluate the proposed algorithm comprehensively ontwo widely used cross domain recognition datasets: 1) object

Page 7: 1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47 ...songwang/document/cyber17.pdf1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47, NO. 4, APRIL 2017 Cross-Domain Recognition by Identifying

1096 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47, NO. 4, APRIL 2017

Algorithm 1 Labeling the Anchor Subspaces

Input: Affinity matrix A, maxIter, labels Y for MS

Output: Labels Y ′ for MT .

Initialization: λ(0), Y ′(0) to zero1: Do2: Update the Y ′(k+1) by solving the following equation:

Y ′(k+1)KK + μ11TY ′(k+1) = μ11T − YCK − 1λ(k)T2

3: Update Y(k+1) = [Y, Y ′(k+1)].4: Update λ(k+1) = λ(k) + μ

(Y(k+1)T

1 − 1)

.5: Check the convergence.6: Until convergence or maxIter iterations reached.

TABLE INUMBERS OF DATA SAMPLES IN EACH CLASS FROM FOUR DOMAINS

recognition image dataset for computer vision tasks and 2) sen-timent classification dataset for natural language processingtasks.

A. Dataset and Experimental Configurations

The first dataset that we evaluate on is an image dataset.The whole dataset has four subdatasets, which we use asfour domains, with 2533 images from ten classes in total, asshown in Table I. The first three subdatasets were collectedby [26], which include images from amazon.com (Amazon),collected with a digital SLR (DSLR) and a webcam (Webcam).The fourth domain is Caltech dataset (Caltech) [45]. For sim-plicity, hereafter, we use “A,” “C,” “D,” and “W” to denotethe “Amazon,” “Caltech,” “DSLR,” and “Webcam” domains,respectively.

In the natural language processing task, customers’ reviewson four different products (kitchen applications, DVDs, books,and electronics) are collected as four domains [46]. Eachreview consists of comment texts and a rating from 0 to 5.Reviews with rating higher than 3 are classified as positivesamples, and the remaining reviews are classified as negativesamples. In total, there are 1000 positive reviews and 1000negative reviews in each domain. The goal of this task isto adapt the classifier training on one domain and use it forclassifying data samples in the other domain.

We ran our algorithm 20 times for each object-recognitiontask and gave the average accuracy rate (%) and standard devi-ation (%). In all of our experiments, we consistently set theparameter γ to be 20 and N to be 5. We show that the perfor-mance of the proposed algorithm is not very sensitive to thesetwo parameters in Section IV-D3.

TABLE IIDISTANCE MATRIX BETWEEN CLASSES ACROSS

SOURCE AND TARGET DOMAINS

TABLE IIIRESULTS OF THE TARGET DATA SAMPLES LABELING

B. Evaluation on the Distance we Used inProposed Algorithm

The distance matrix in Table II is given to show thatthe subspace-based distance is suitable for our method. Inthis table, we use the data from object recognition dataset.Each column denotes the distance between a specific class(C1, . . . , C10) in source domain and a class (C1, . . . , C10)in target domain. Note that all the numbers are the averageof all 12 pairs of the source and target domain (describedin Section IV-D1). We can see that the distances, across twodomains, between the same class are relatively smaller thanthose between different classes. Therefore, the results alsodemonstrate the assumption that the samples with the sameclass share the same subspace even though they are from dif-ferent domains, i.e., the distance between subspaces acrosssource and target domains of a specific class tends to besmaller than that between different classes.

C. Evaluation on Anchor Subspaces Labeling

We also evaluate the effectiveness of the proposed anchorsubspaces labeling algorithm introduced in Section III-D. InTable III, “Labeled #” denotes the number of target data sam-ples which are used for generating the anchor subspaces and

Page 8: 1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47 ...songwang/document/cyber17.pdf1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47, NO. 4, APRIL 2017 Cross-Domain Recognition by Identifying

LIN et al.: CROSS-DOMAIN RECOGNITION BY IDENTIFYING JOINT SUBSPACES OF SOURCE DOMAIN AND TARGET DOMAIN 1097

TABLE IVRESULTS OF SINGLE SOURCE AND TARGET DOMAIN ON THE OBJECT RECOGNITION DATASET.

“−” DENOTES THAT THERE IS NO RESULT REPORTED BEFORE

then assigned labels, and “Labeling Accuracy” denotes thelabeling accuracy (%). “Baseline (SVM)” denotes the accu-racy by directly utilizing the SVM classifier trained from thesource domain to the target domain. As shown in Table III,we can see that the number of the target data samples used forlabeling (Labeled # column) are only about 10% out of thetotal number of the target data samples (the “Total #” column),as well as the accuracies of the labeling are not perfect (the“Accuracy” column). This shows that the proposed algorithmbenefits from involving the information from the target domainby labeling the target data samples, even through the num-ber of the labeled target data are relatively small and labelingaccuracies are not perfect. Especially, the accuracy of proposedmethod is even better than the label assignment in the case ofW → D. We argue that the labeled target data samples pro-vide the information from the target domain with noise. SVMclassifier takes the information from the target domain withnoise-tolerant to improve the performance comparing with theclassifier without using target information.

D. Cross-Domain Recognition on Object Dataset

Following the way of feature extraction for each imagein [11], we first use an SURF [49] detector to extract pointsof interest from each image. We then randomly select a sub-set of the points of interest and quantize their descriptors to800 visual words using the K-means clustering. Finally, weconstruct a 800-D feature vector for each image using thebag-of-visual-words technique.

1) Single Source Domain and Single Target Domain: Wereport the results on all 12 possible pairs of source- and

target-domain combinations. We compare our algorithm withnine other methods, including K-singular value decompo-sition [47], sampling geodesic flow (SGF) [8], GFK [9],Metric [26], information-theoretical learning [48], subspaceinterpolation [11], subspace alignment [36], domain adaptationby shifting covariance [13], and transfer joint matching [14].Also, we give the accuracy rate by directly utilizing the SVMtrained from source domain to the target domain. Their resultsin Table IV are obtained from previous papers, mostly by theoriginal authors. It can be seen that our algorithm performsbest in 9 out 12 domain pairs. In particular, in four domainpairs our algorithm significantly outperforms (by more than5%) all the comparison methods, i.e., C→A, C→D, A→C,and C→W. Our algorithm shows a comparable performancewith the best performed method in the other three domainpairs. Note that the “metric” method [26] is a semisupervisedmethod.

2) Multiple Source/Target Domains: We then evaluate theperformance when there are multiple source/target domains.To get the fair comparison with other method, we alsodirectly get the results from the previous literature. Thus,we only conduct the multiple source/target domains cross-domain recognition on six possible different source- andtarget-domain combinations, followed [34], among whichthree combinations include two source domains and onetarget domain, and the other three combinations includeone source domain and two target domains. When thereare multiple source/target domains, we simply merge thedata samples in all the source/target domains as a singledomain.

Page 9: 1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47 ...songwang/document/cyber17.pdf1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47, NO. 4, APRIL 2017 Cross-Domain Recognition by Identifying

1098 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47, NO. 4, APRIL 2017

TABLE VRESULTS OF MULTISOURCE DOMAIN ADAPTATION ON THE OBJECT

RECOGNITION DATASET. NOTE THAT ALL THE COMPARISON

METHODS ARE SEMISUPERVISED DOMAIN ADAPTATION

METHOD EXCEPT THE US MARK ONE

When there are multiple source domains, we report theresults of four comparison methods, including SGF [8], robustdomain adaptation with low-rank (RDALR) [10], Fisher dis-crimination dictionary learning (FDDL) [50], shared domain-adapted dictionary learning (SDDL) [12], hierarchical match-ing pursuit [51], and the model in [34], as shown in Table V.For SGF [8], we report its performance under both unsuper-vised and semisupervised settings. Note that RDALR, FDDL,and SDDL are all semisupervised methods, while our proposedmethod is unsupervised. It is clearly to see that the proposedmethod outperforms all the comparison methods significantly.

In principle, using multiple source domains should providemore information for each class, which should result in higherperformance than using a single source domain. For example,for the domain combination of “W + D→A” (41.3%), it showsa marginal performance improvement over the single-sourcedomain cases, “D→A” (38.5%) and “W→A” (39.1%). In prac-tice, however, we do not always achieve higher performancewhen using multiple source domains. For example, comparingthe results from Tables IV and V, the performance of “D +A→W” (73.2%) lies in between the performances of twosingle-source domain cases “D→W” (89.5%) and “A→W”(42.4%). This is an interesting problem to be studied in ourfuture work.

When there are multiple target domains, we only find twocomparison methods, SGF [8] and the method from [34]. Bothunsupervised and semisupervised settings were developed forSGF, and the model from [34] is the semisupervised-basedmethod. We take the performance from [8] and [34], andinclude that in Table VI. It can be seen that the proposedalgorithm performs better than both settings of SGF and twoout of three cases of the model from [34].

In principle, using multiple target domains does providemore information, since the labels are only available in thesource domain. Accordingly, the performance of using multi-ple target domains should be the weighted (based on numbersof data samples in each involved target domain) average ofthe performance of using each target domain separately. Theresults in Tables IV and VI are largely aligned with thisexpectation.

3) Performance Under Different Parameter Settings: Thereare two main parameters in the proposed algorithm: 1) thedesired average size of each group constructed in the target

TABLE VIRESULTS OF MULTITARGET DOMAIN ADAPTATION ON THE 2-D OBJECT

RECOGNITION DATASET. “SS” AND “US” DENOTE THE

SEMISUPERVISED AND UNSUPERVISED

SETTING, RESPECTIVELY

Fig. 6. Results when varying value for (a) γ , which determines the numberof groups obtained by the K-means algorithm, and (b) N, which indicates thenumber of samples in each anchor subspace.

domain using the K-means algorithm, i.e., γ and 2) the numberof data samples in each anchor subspace, i.e., N. In order toinvestigate the sensitivity to different parameter settings, wetune each of the two parameters, respectively, and report theperformance of each parameter setting. For each parametersetting, we report the accuracy rate by percentage, for eightcombinations of single source- and target-domains, and sixcombinations of multiple source/target domains.

We take the value of γ in the range of 5–30 with the steplength of 5, and the results are shown in Fig. 6(a). We can seethat, the performance only varies in a small range for almostall the domain combinations, except for “D, A→W.”

For N, we take its value in the range of 3 to 10 with thestep length of 1 and the results are reported in Fig. 6(b). It isclear to see that the performance only varies in a small range

Page 10: 1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47 ...songwang/document/cyber17.pdf1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47, NO. 4, APRIL 2017 Cross-Domain Recognition by Identifying

LIN et al.: CROSS-DOMAIN RECOGNITION BY IDENTIFYING JOINT SUBSPACES OF SOURCE DOMAIN AND TARGET DOMAIN 1099

TABLE VIIDOMAIN ADAPTATION RESULTS ON THE SENTIMENT CLASSIFICATION.

K: KITCHEN, D: DVD, B: BOOKS, AND E: ELECTRONICS

for almost all the domain combinations, except for D→W andD, A→W when N > 8.

Therefore, we can conclude that the performance of the pro-posed algorithm is not very sensitive to the two parameters γ

and N.4) Running Time: All the running times are estimated on a

laptop with Intel i7-2620M CPU and 4 GB RAM. We mainlyimplemented our algorithm by MATLAB 2014, with severalcore functions implemented by C++ for speeding up. Thereare five major steps in the proposed algorithm as introducedin Section III-B. The running time for the first and the fourthsteps can be ignored. The second step requires the K-meansalgorithm, which takes an average time of 0.2 s (varying alongwith the number of target data samples). The third step, i.e.,labeling the anchor subspaces, includes two substeps. The firstone is to calculate the affinity matrices which reflect the dis-tances between anchor subspaces and interdomain subspaces,respectively. Each distance calculation requires the orthogo-nalization and SVD. In total, it takes average time of 0.02 sfor calculating these two matrices. The second substep is tolabel anchor subspaces based on those two affinity matricesby using Algorithm 1. This algorithm is iterative based andits time complexity is known as O(T(C6+L6)), where T is themaximal iteration number. Fortunately, the values of C and Lare both small, i.e., C = 10 and L < 20 based on Table III.This algorithm takes an average time of 0.3 s. The most timeconsuming step in the proposed algorithm is the SVM classi-fier training, which takes an average time of 45 s. The runningtimes for different domain pairs vary along with the numberof training data samples, from the 3 s for D→W to 100 s for“C→A.”

E. Cross-Domain Recognition on SentimentClassification Dataset

Although the proposed algorithm is originally designed forvision tasks, it can be easily utilized for cross domain tasksin other areas. In this section, as an example, we comparethe proposed algorithm with other seven methods in a domainadaptation task from the natural language processing area.

We follow the same experiment setup described in [52].In each domain, 1600 reviews including 800 positive reviewsand 800 negative reviews, are used as the training set, andthe rest 400 reviews are used as the testing set. We extract

unigram and bigram features on the comment texts, and thefeature dimension is reduced to 400. Finally, each commenttext is represented by a 400-D feature using the bag-of-wordstechnique.

We conduct experiments on four pairs of source- and target-domain combinations. The same experiment has been alsoconducted in [52]. The performance is reported in Table VII. Itis clear to see that overall the proposed algorithm outperformsother seven methods. From this, we can see that the proposedalgorithm can also be used for DA problems in nonvisionareas.

V. CONCLUSION

This paper introduces a new subspace-based domain adap-tation algorithm. The joint subspace is independently con-structed for each class, which covers both source and targetdomains. The joint subspace carries the information not onlyabout the intrinsic characteristics of the considered class,but also about the specificity for each domain. Classifiersare trained on these joint subspaces. The proposed algorithmhas been evaluated on two widely used datasets. Comparisonresults show that the proposed algorithm outperforms severalexisting methods on both datasets.

REFERENCES

[1] Y. Lin et al., “Cross-domain recognition by identifying compact jointsubspaces,” in Proc. Int. Conf. Image Process., Quebec City, QC,Canada, 2015, pp. 3461–3465.

[2] A. Torralba and A. A. Efros, “Unbiased look at dataset bias,” in Proc.IEEE Conf. Comput. Vis. Pattern Recognit., Providence, RI, USA, 2011,pp. 1521–1528.

[3] S. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans. Knowl.Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010.

[4] Y. Luo, D. Tao, B. Geng, C. Xu, and S. J. Maybank, “Manifoldregularized multitask learning for semi-supervised multilabel image clas-sification,” IEEE Trans. Image Process., vol. 22, no. 2, pp. 523–536,Feb. 2013.

[5] Y. Luo, T. Liu, D. Tao, and C. Xu, “Decomposition-based transferdistance metric learning for image classification,” IEEE Trans. ImageProcess., vol. 23, no. 9, pp. 3789–3801, Sep. 2014.

[6] W. Liu, D. Tao, J. Cheng, and Y. Tang, “Multiview Hessian dis-criminative sparse coding for image annotation,” Comput. Vis. ImageUnderstand., vol. 118, pp. 50–60, Jan. 2014.

[7] B. Li, X. Zhu, R. Li, and C. Zhang, “Rating knowledge sharing in cross-domain collaborative filtering,” IEEE Trans. Cybern., vol. 45, no. 5,pp. 1068–1082, May 2015.

[8] R. Gopalan, R. Li, and R. Chellappa, “Domain adaptation for objectrecognition: An unsupervised approach,” in Proc. IEEE Int. Conf.Comput. Vis., Barcelona, Spain, 2011, pp. 999–1006.

[9] B. Gong, Y. Shi, F. Sha, and K. Grauman, “Geodesic flow kernel forunsupervised domain adaptation,” in Proc. IEEE Conf. Comput. Vis.Pattern Recognit., Providence, RI, USA, 2012, pp. 2066–2073.

[10] I.-H. Jhuo, D. Liu, D. T. Lee, and S.-F. Chang, “Robust visual domainadaptation with low-rank reconstruction,” in Proc. IEEE Conf. Comput.Vis. Pattern Recognit., Providence, RI, USA, 2012, pp. 2168–2175.

[11] J. Ni, Q. Qiu, and R. Chellappa, “Subspace interpolation via dictio-nary learning for unsupervised domain adaptation,” in Proc. IEEE Conf.Comput. Vis. Pattern Recognit., Portland, OR, USA, 2013, pp. 692–699.

[12] S. Shekhar, V. M. Patel, H. V. Nguyen, and R. Chellappa, “Generalizeddomain-adaptive dictionaries,” in Proc. IEEE Conf. Comput. Vis. PatternRecognit., Portland, OR, USA, 2013, pp. 361–368.

[13] Z. Cui et al., “Flowing on Riemannian manifold: Domain adapta-tion by shifting covariance,” IEEE Trans. Cybern., vol. 44, no. 12,pp. 2264–2273, Dec. 2014.

[14] M. Long, J. Wang, G. Ding, J. Sun, and P. S. Yu, “Transfer joint match-ing for unsupervised domain adaptation,” in Proc. IEEE Conf. Comput.Vis. Pattern Recognit., Columbus, OH, USA, 2014, pp. 1410–1417.

Page 11: 1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47 ...songwang/document/cyber17.pdf1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47, NO. 4, APRIL 2017 Cross-Domain Recognition by Identifying

1100 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47, NO. 4, APRIL 2017

[15] E. Elhamifar and R. Vidal, “Sparse subspace clustering: Algorithm,theory, and applications,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 35, no. 11, pp. 2765–2781, Nov. 2013.

[16] R. Basri and D. W. Jacobs, “Lambertian reflectance and linear sub-spaces,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 3,pp. 218–233, Feb. 2003.

[17] T. Hastie and P. Y. Simard, “Metrics and models for handwrittencharacter recognition,” Stat. Sci., vol. 13, no. 1, pp. 54–65, 1998.

[18] J. J. DiCarlo, D. Zoccolan, and N. C. Rust, “How does the brainsolve visual object recognition?” Neuron, vol. 73, no. 3, pp. 415–434,Feb. 2012.

[19] V. M. Patel, R. Gopalan, R. Li, and R. Chellappa, “Visual domain adap-tation: A survey of recent advances,” IEEE Signal Process. Mag., vol. 32,no. 3, pp. 53–69, May 2015.

[20] D. Tao, X. Tang, X. Li, and X. Wu, “Asymmetric bagging and ran-dom subspace for support vector machines-based relevance feedback inimage retrieval,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 7,pp. 1088–1099, Jul. 2006.

[21] D. Tao, X. Li, X. Wu, and S. J. Maybank, “General tensor discriminantanalysis and Gabor features for gait recognition,” IEEE Trans. PatternAnal. Mach. Intell., vol. 29, no. 10, pp. 1700–1715, Oct. 2007.

[22] N. Wang, D. Tao, X. Gao, X. Li, and J. Li, “Transductive face sketch-photo synthesis,” IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 9,pp. 1364–1376, Sep. 2013.

[23] J. Yu, Y. Rui, and D. Tao, “Click prediction for Web image rerankingusing multimodal sparse coding,” IEEE Trans. Image Process., vol. 23,no. 5, pp. 2019–2032, May 2014.

[24] J. Yu, D. Tao, M. Wang, and Y. Rui, “Learning to rank using user clicksand visual features for image retrieval,” IEEE Trans. Cybern., vol. 45,no. 4, pp. 767–779, Apr. 2015.

[25] H. Daumé, III, “Frustratingly easy domain adaptation,” in Proc. Conf.Assoc. Comput. Linguist., 2007, pp. 256–263.

[26] K. Saenko, B. Kulis, M. Fritz, and T. Darrell, “Adapting visual cate-gory models to new domains,” in Proc. Eur. Conf. Comput. Vis., 2010,pp. 213–226.

[27] A. Kumar, A. Saha, and H. Daume, “Co-regularization based semi-supervised domain adaptation,” in Proc. Adv. Neural Inf. Process. Syst.,2010, pp. 478–486.

[28] M. Chen, K. Q. Weinberger, and J. C. Blitzer, “Co-training fordomain adaptation,” in Proc. Adv. Neural Inf. Process. Syst., 2011,pp. 2456–2464.

[29] S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, “Domain adaptationvia transfer component analysis,” IEEE Trans. Neural Netw., vol. 22,no. 2, pp. 199–210, Feb. 2011.

[30] L. Duan, I. W. Tsang, and D. Xu, “Domain transfer multiple kernellearning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 3,pp. 465–479, Mar. 2012.

[31] J. Donahue, J. Hoffman, E. Rodner, K. Saenko, and T. Darrell,“Semi-supervised domain adaptation with instance constraints,” in Proc.IEEE Conf. Comput. Vis. Pattern Recognit., Portland, OR, USA, 2013,pp. 668–675.

[32] J. Donahue et al., “DeCAF: A deep convolutional activation feature forgeneric visual recognition,” in Proc. Int. Conf. Mach. Learn., Beijing,China, 2014, pp. 647–655.

[33] M. Xiao and Y. Guo, “Feature space independent semi-superviseddomain adaptation via kernel matching,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 37, no. 1, pp. 54–66, Jan. 2015.

[34] R. Gopalan, R. Li, and R. Chellappa, “Unsupervised adaptation acrossdomain shifts by generating intermediate data representations,” IEEETrans. Pattern Anal. Mach. Intell., vol. 36, no. 11, pp. 2288–2302,Nov. 2014.

[35] R. Caseiro, J. F. Henriques, P. Martins, and J. Batista, “Beyond the short-est path: Unsupervised domain adaptation by sampling subspaces alongthe spline flow,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,Boston, MA, USA, 2015, pp. 3846–3854.

[36] B. Fernando, A. Habrard, M. Sebban, and T. Tuytelaars, “Unsupervisedvisual domain adaptation using subspace alignment,” in Proc. IEEE Int.Conf. Comput. Vis., Sydney, NSW, Australia, 2013, pp. 2960–2967.

[37] T. Tommasi and B. Caputo, “Frustratingly easy NBNN domain adapta-tion,” in Proc. IEEE Int. Conf. Comput. Vis., Sydney, NSW, Australia,2013, pp. 897–904.

[38] M. Baktashmotlagh, M. T. Harandi, B. C. Lovell, and M. Salzmann,“Domain adaptation on the statistical manifold,” in Proc. IEEEConf. Comput. Vis. Pattern Recognit., Columbus, OH, USA, 2014,pp. 2481–2488.

[39] V. Zografos, L. Ellisy, and R. Mester, “Discriminative subspace cluster-ing,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Portland, OR,USA, 2013, pp. 2107–2114.

[40] J. Hamm and D. D. Lee, “Grassmann discriminant analysis: A unifyingview on subspace-based learning,” in Proc. Int. Conf. Mach. Learn.,Helsinki, Finland, 2008, pp. 376–383.

[41] D. Zhai et al., “Parametric local multimodal hashing for cross-viewsimilarity search,” in Proc. Int. Joint Conf. Artif. Intell., Beijing, China,2013, pp. 2754–2760.

[42] M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionalityreduction and data representation,” Neural Comput., vol. 15, no. 6,pp. 1373–1396, 2003.

[43] X. Zhu, Z. Ghahramani, and J. Lafferty, “Semi-supervised learning usingGaussian fields and harmonic functions,” in Proc. Int. Conf. Mach.Learn., Washington, DC, USA, 2003, pp. 912–919.

[44] K. B. Petersen and M. S. Pedersen, The Matrix Cookbook, Tech. Univ.Denmark, Kongens Lyngby, Denmark, 2012.

[45] G. Griffin, A. Holub, and P. Perona, “Caltech-256 object categorydataset,” California Inst. Technol., Pasadena, CA, USA, Tech. Rep. 7694,2007.

[46] J. Blitzer, M. Dredze, and F. Pereira, “Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification,” inProc. Conf. Assoc. Comput. Linguist., Prague, Czech Republic, 2007,pp. 440–447.

[47] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm fordesigning overcomplete dictionaries for sparse representation,” IEEETrans. Signal Process., vol. 54, no. 11, pp. 4311–4322, Nov. 2006.

[48] Y. Shi and F. Sha, “Information-theoretical learning of discriminativeclusters for unsupervised domain adaptation,” in Proc. Int. Conf. Mach.Learn., Edinburgh, U.K., 2012, pp. 1–8.

[49] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool, “Speeded-up robustfeatures (SURF),” Comput. Vis. Image Understand., vol. 110, no. 3,pp. 346–359, 2008.

[50] M. Yang, L. Zhang, X. Feng, and D. Zhang, “Fisher discriminationdictionary learning for sparse representation,” in Proc. IEEE Int. Conf.Comput. Vis., Barcelona, Spain, 2011, pp. 543–550.

[51] L. Bo, X. Ren, and D. Fox, “Hierarchical matching pursuit for imageclassification: Architecture and fast algorithms,” in Proc. Adv. NeuralInf. Process. Syst., 2011, pp. 2115–2123.

[52] B. Gong, K. Grauman, and F. Sha, “Connecting the dots with landmarks:Discriminatively learning domain-invariant features for unsuperviseddomain adaptation,” in Proc. Int. Conf. Mach. Learn., Atlanta, GA, USA,2013, pp. 222–230.

[53] J. Blitzer, R. McDonald, and F. Pereira, “Domain adaptation with struc-tural correspondence learning,” in Proc. Conf. Empir. Methods Nat.Lang. Process., Sydney, NSW, Australia, 2006, pp. 120–128.

[54] J. Huang, A. J. Smola, A. Gretton, K. M. Borgwardt, and B. Scholkopf,“Correcting sample selection bias by unlabeled data,” in Proc. Adv.Neural Inf. Process. Syst., Vancouver, BC, Canada, 2007, pp. 601–608.

Yuewei Lin (S’09) received the B.S. degree inoptical information science and technology fromSichuan University, Chengdu, China, and the M.E.degree in optical engineering from ChongqingUniversity, Chongqing, China. He is currently pur-suing the Ph.D. degree with the Department ofComputer Science and Engineering, University ofSouth Carolina, Columbia, SC, USA.

His current research interests include com-puter vision, machine learning, and image/videoprocessing.

Jing Chen received the B.S. degree in computa-tional mathematics from Shandong University, Jinan,China, and the M.S. and Ph.D. degrees in computerscience and technology from Chongqing University,Chongqing, China.

He is currently a Post-Doctoral Fellow with theFaculty of Science and Technology, University ofMacau, Macau, China. His current research inter-ests include machine learning, pattern recognition,and biomedical information processing and analysis.

Page 12: 1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47 ...songwang/document/cyber17.pdf1090 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 47, NO. 4, APRIL 2017 Cross-Domain Recognition by Identifying

LIN et al.: CROSS-DOMAIN RECOGNITION BY IDENTIFYING JOINT SUBSPACES OF SOURCE DOMAIN AND TARGET DOMAIN 1101

Yu Cao (M’14) received the B.S. degree in informa-tion and computation science and the M.S. degree inapplied mathematics from Northeastern University,Shenyang, China, in 2003 and 2007, respectively,and the Ph.D. degree in computer science and engi-neering from the University of South Carolina,Columbia, SC, USA, in 2013.

He is currently a Post-Doctoral Researcher withIBM Almaden Research Center, San Jose, CA,USA. His current research interests include com-puter vision, machine learning, pattern recognition,

and medical image processing.

Youjie Zhou received the B.S. degree in soft-ware engineering from East China NormalUniversity (ECNU), Shanghai, China, in 2010. Heis currently pursuing the Ph.D. degree in computerscience and engineering with the University ofSouth Carolina, Columbia, SC, USA.

He is a Research Assistant with the ComputerVision Laboratory, University of South Carolina.From 2007 to 2010, he was a Research Assistantwith the Institute of Massive Computing, ECNU,where he researched on multimedia news explo-

ration and retrieval. His current research interests include computer vision,machine learning, and large-scale multimedia analysis.

Lingfeng Zhang (S’15) received the B.S. degreein mathematics and the M.S. degree in computerscience from Chongqing University, Chongqing,China, in 2009 and 2012, respectively. He is cur-rently pursuing the Ph.D. degree with the Universityof Houston, Houston, TX, USA.

His current research interests include machinelearning and big data analysis.

Yuan Yan Tang (F’04) received the Ph.D. degreein computer science from Concordia University,Montreal, QC, Canada, in 1990.

He is a Chair Professor with the Facultyof Science and Technology, University ofMacau, Macau, China, and a Professor/AdjunctProfessor/Honorary Professor at several insti-tutes in China, USA, Canada, France, andHong Kong. He has published over 400 academicpapers and authored/co-authored over 25 mono-graphs/books/bookchapters. His current research

interests include wavelets, pattern recognition, image processing, and artificialintelligence.

Dr. Tang is the Founder and the Editor-in-Chief of the InternationalJournal on Wavelets, Multiresolution, and Information Processing and anAssociate Editor of several international journals. He is the Founder and theChair of Pattern Recognition Committee in the IEEE TRANSACTIONS ON

SYSTEMS, MAN, AND CYBERNETICS and the Macau Branch of InternationalAssociate of Pattern Recognition (IAPR). He is the Founder and the GeneralChair of the series International Conferences on Wavelets Analysis andPattern Recognition. He has served as the General Chair, the Program Chair,and a Committee Member for several international conferences. He is afellow of IAPR.

Song Wang (SM’13) received the Ph.D. degreein electrical and computer engineering from theUniversity of Illinois at Urbana–Champaign (UIUC),Urbana, IL, USA, in 2002.

From 1998 to 2002, he was a Research Assistantwith the Image Formation and Processing Group,Beckman Institute, UIUC. In 2002, he joined theDepartment of Computer Science and Engineering,University of South Carolina, Columbia, SC, USA,where he is currently a Professor. His currentresearch interests include computer vision, medical

image processing, and machine learning.Prof. Wang currently serves as the Publicity/Web Portal Chair of the

Technical Committee of Pattern Analysis and Machine Intelligence, the IEEEComputer Society, and an Associate Editor of Pattern Recognition Letters.