47
Tensor Canonical Correlation Analysis and Its applications Presenter: Yong LUO The work is done when Yong LUO was a Research Fellow at Nanyang Technological University, Singapore

Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

  • Upload
    others

  • View
    19

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Tensor Canonical Correlation Analysis and Its applications

Presenter: Yong LUO

The work is done when Yong LUO was a Research Fellow at Nanyang Technological University, Singapore

Page 2: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Outline

• Y. Luo, D. C. Tao, R. Kotagiri, C. Xu, and Y. G. Wen, “Tensor Canonical Correlation Analysis for Multi-view Dimension Reduction,” IEEE Transactions on Knowledge and Data Engineering (T-KDE), vol. 27, no. 11, pp. 3111-3124, 2015.

• Y. Luo, Y. G. Wen and D. C. Tao, “On Combining Side Information and Unlabeled Data for Heterogeneous Multi-task Metric Learning,”International Joint Conference on Artificial Intelligence (IJCAI), pp. 1809-1815, 2016.

Page 3: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Multi-view dimension reduction (MVDR)• Dimension reduction (DR)

• Find a low-dimensional representation for high dimensional data

• Benefits: reduce the chance of over-fitting, reduce computational cost, etc.

• Approaches: feature selection (IG, MI, sparse learning, etc.), feature transformation (PCA, LDA, LE, etc.)

Page 4: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

MVDR

• Real world objects usually contain information from multiple sources, and can be extracted different kinds of features.

• Traditional DR methods cannot effectively handle multiple types of features

Feature concatenation

Page 5: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

MVDR

• Multi-view learning• Learn to fuse multiple distinct feature representations

• Families: weighted view combination, multi-view dimension reduction, view agreement exploration

• Multi-view dimension reduction• Multi-view feature selection

• Multi-view subspace learning: seek a low-dimensional common subspace to compactly represent the heterogeneous data; One of the most representative model: CCA

Page 6: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Canonical correlation analysis (CCA)

• Objective of CCA• Correlation maximization on the common subspace

argmax𝐳1,𝐳2

𝜌 = corr 𝐳1, 𝐳2 =𝐡1𝑇𝐶12𝐡2

𝐡1𝑇𝐶11𝐡1𝐡2

𝑇𝐶22𝐡2

𝑥1

𝑥2

𝑧

𝑧1𝑛 = 𝐱1𝑛𝑇 𝐡1

𝑧2𝑛 = 𝐱2𝑛𝑇 𝐡2

H. Hotelling, “Relations between two sets of variants,” Biometrika, 1936.D. P. Foster, et al., “Multi-view dimensionality reduction via canonical correlation analysis,” Tech. Rep., 2008.

Page 7: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Generalizations of CCA to several views• CCA-MAXVAR

• Generalizes CCA to 𝑀 ≥ 2 views

• 𝐳𝑚 = 𝑋𝑚𝑇 𝐡𝑚 is the vector of canonical variables for the

𝑚’th view, and 𝐳 is a centroid representation

• Solutions can be obtained using the SVD of 𝑋𝑚

argmin𝐳,𝐚, 𝐡𝑝 𝑚=1

𝑀

1

𝑀

𝑚=1

𝑀

𝐳 − 𝛼𝑚𝐳𝑚 22 , s. t. 𝐳𝑚 2 = 1

J. R. Kettenring, “Canonical analysis of several sets of variables,” Biometrika, 1971.

Page 8: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Generalizations of CCA to several views• CCA-LS

• Equivalent to CCA-MAXVAR, but can be solved efficiently and adaptively based on LS regression

argmin𝐡𝑚 𝑚=1

𝑀

1

2𝑀 𝑀 − 1

𝑝,𝑞=1

𝑀

𝑋𝑝𝑇𝐡𝑝 − 𝑋𝑞

𝑇𝐡𝑞 2

2

s. t.1

𝑚

𝑚=1

𝑀

𝐡𝑚𝑇 𝐶𝑚𝑚𝐡𝑚 = 1

J. Via et al., “A learning algorithm for adaptive canonical correlation analysis of several data sets,” Neural Networks, 2007.

Page 9: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

The proposed TCCA framework

• Main drawback of CCA-MAXVAR and CCA-LS• Only the statistics (correlation information) between

pairs of features is explored, while high-order statistics is ignored

• Tensor CCA• Directly maximize the high-order correlation between all

views

High order tensor correlation

𝑥3

𝑥1

𝑥2

𝑑1

𝑑2

𝑑3

Pairwise correlation

𝑥2

𝑥1

𝑥3

𝑑1

𝑑2

𝑑3

Page 10: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

The proposed TCCA framework for MVDR

𝐮21

𝐮31

𝜆 1 𝐮11

𝐮2𝑟

𝐮3𝑟

𝜆 𝑟 𝐮1𝑟+ ⋯ ⋯ +

⋮ 𝑋2

𝑋1

⋮ 𝑋3

𝑁

𝑑1

𝑑2

𝑑3

⋯𝑈1

⋮𝑈2

⋯𝑈3

Tensor CCA

𝑍1

𝑍2

𝑍3

𝑟

𝑟

𝑟

𝑍

3𝑟

Mapping

LAB

WT

SIFT

𝑥3

𝑥1

𝑥2𝑑1

𝑑2

𝑑3

𝒞123

Covariance tensor Sum of rank-1 approximation

Page 11: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Tensor basics

• Generalization of an n-dimensional array

Vector: order-1 tensor

Matrix: order-2 tensor

Scalar: order-0 tensor

Order-3 tensor

Page 12: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Tensor basics

• Tensor-matrix multiplication• The 𝑚-mode product of an 𝐼1 × 𝐼2 ×⋯× 𝐼𝑀 tensor 𝒜

and an 𝐽𝑚 × 𝐼𝑚 matrix 𝑈 is a tensor ℬ = 𝒜 ×𝑚 𝑈 of size 𝐼1 ×⋯× 𝐼𝑚−1 × 𝐽𝑚 × 𝐼𝑚+1⋯× 𝐼𝑀 with the element

• The product of 𝒜 and a sequence of matrices ሼ𝑈𝑚 ∈

ℬ 𝑖1, ⋯ , 𝑖𝑚−1, 𝑗𝑚, 𝑖𝑚+1, ⋯ , 𝑖𝑀 =

𝑖𝑚=1

𝐼𝑚

𝒜 𝑖1, 𝑖2, ⋯ , 𝑖𝑀 𝑈 𝑗𝑚, 𝑖𝑚

ℬ = 𝒜 ×1 𝑈1 ×2 𝑈2⋯×𝑀 𝑈𝑀

Page 13: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Tensor basics

• Tensor-vector multiplication• The contracted 𝑚-mode product of 𝒜 and an 𝐼𝑚-vector 𝐮 is an 𝐼1 ×⋯× 𝐼𝑚−1 × 𝐼𝑚+1⋯× 𝐼𝑀 tensor ℬ =𝒜 ഥ×𝑚 𝐮 of order 𝑀 − 1 with the entry

• Tensor-tensor multiplication • Outer product, contracted product, inner product

• Frobenius norm of the tensor

ℬ 𝑖1, ⋯ , 𝑖𝑚−1, 𝑖𝑚+1, ⋯ , 𝑖𝑀 =

𝑖𝑚=1

𝐼𝑚

𝒜 𝑖1, 𝑖2, ⋯ , 𝑖𝑀 𝐮 𝑖𝑚

𝒜 𝐹2 = 𝒜,𝒜 =

𝑖1=1

𝐼1

𝑖2=1

𝐼2

𝑖𝑀=1

𝐼𝑀

𝒜 𝑖1, 𝑖2, ⋯ , 𝑖𝑀2

Page 14: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Tensor basics

• Matricization• The mode-𝑚 matricization of 𝒜 is denoted as an 𝐼𝑚 ×𝐼1⋯𝐼𝑚−1𝐼𝑚+1⋯𝐼𝑀 matrix 𝐴 𝑚

mo

de

-1

mode-2frontal matricizing

horizontal matricizing

𝒜

𝐴 1

𝐴 2

𝐴 3

row-wise vectorizing

column-wise vectorizing

Page 15: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Tensor basics

• Matricization property• The 𝑚-mode multiplication ℬ = 𝒜 ×𝑚 𝑈 can be

manipulated as matrix multiplication by storing the tensors in metricized form, i.e., 𝐵 𝑚 = 𝑈𝐴 𝑚

• The series of 𝑚-mode product can be expressed as a series of Kronecker products

𝑐1, 𝑐2, ⋯ , 𝑐𝐾 = 𝑚 + 1,𝑚 + 2,⋯ ,𝑀, 1, 2,⋯ ,𝑚 − 1 is a forward cyclic ordering for indices of the tensor dims

ℬ = 𝒜 ×1 𝑈1 ×2 𝑈2⋯×𝑀 𝑈𝑀

𝐵 𝑚 = 𝑈𝑚𝐴 𝑚 𝑈𝑐𝑚−1⨂𝑈𝑐𝑚−1

⨂⋯⨂𝑈𝑐𝑚−1

𝑇

Page 16: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

TCCA formulation

• Optimization problem• Maximize the correlation between the canonical variables 𝐳𝑚 = 𝑋𝑚

𝑇 𝐡𝑚, 𝑚 = 1,⋯ ,𝑀:

• Equivalent formulation

• Covariance tensor: 𝒞12⋯𝑀 =1

𝑁σ𝑛=1𝑁 𝐱1𝑛 ∘ 𝐱2𝑛 ∘ ⋯ ∘ 𝐱𝑀𝑛

argmax𝐡𝑚

𝜌 = corr 𝐳1, 𝐳2, ⋯ , 𝐳𝑀 = 𝐳1⨀𝐳2⨀⋯⨀𝐳𝑀𝑇𝐞 ,

s. t. 𝐳𝑚𝑇 𝐳𝑚 = 1,𝑚 = 1,⋯ ,𝑀

argmax𝐡𝑚

𝜌 = 𝒞12⋯𝑚 ഥ×1 𝐡1𝑇 ഥ×2 𝐡2

𝑇⋯ ഥ×𝑀 𝐡𝑀𝑇 ,

s. t. 𝐡𝑚𝑇 𝐶𝑚𝑚 + 𝜖𝐼 𝐡𝑚 = 1,𝑚 = 1,⋯ ,𝑀

Page 17: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

TCCA formulation

• Reformulation• Let ℳ = 𝒞12⋯𝑀 ×1 ሚ𝐶11

Τ1 2 ×2ሚ𝐶22Τ1 2⋯×𝑀

ሚ𝐶𝑀𝑀Τ1 2, and

𝐮𝑚 = ሚ𝐶𝑚𝑚Τ1 2𝐡𝑚, where ሚ𝐶𝑚𝑚 = 𝐶𝑚𝑚 + 𝜖𝐼

• Main solution• If define ℳ = 𝜌𝐮1 ∘ 𝐮2 ∘ ⋯ ∘ 𝐮𝑀, problem becomes

• Solved by alternating least square (ALS), high-order power method (HOPM), etc.

argmax𝐮𝑚

𝜌 = ℳ ഥ×1 𝐮1𝑇 ഥ×2 𝐮2

𝑇⋯ ഥ×𝑀 𝐮𝑀𝑇 ,

s. t. 𝐮𝑚𝑇 𝐮𝑚 = 1,𝑚 = 1,⋯ ,𝑀

argmin𝐮𝑚

ℳ− ℳ𝐹

2, [Lathauwer et al., 2000a]

L. De Lathauwer et al., “On the best Rank-1 and rank-(r1, r2, ..., rn) approximation of higher-order tensors,” SIAM J. Matrix Anal. Appl., 2000.

Page 18: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

TCCA solution

• Solutions• Remaining solutions: recursively maximizing the same

correlation as presented in the main TCCA problem

• All solutions: the best sum of rank-1 approximation, i.e., rank-𝑟 CP decomposition of ℳ

• Projected data

ℳ ≈

𝑘=1

𝑟

𝜌𝑘𝐮1𝑘∘ 𝐮2

𝑘∘ ⋯ ∘ 𝐮𝑀

𝑘

𝑍𝑚 = 𝑋𝑚𝑇 ሚ𝐶𝑚𝑚

Τ−1 2𝑈𝑚 𝑈𝑚 = 𝐮𝑚1, ⋯ , 𝐮𝑚

𝑟

Page 19: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

KTCCA formulation

• Non-linear extension• Non-linear feature mapping ∅:

• Canonical variables: 𝐳𝑚 = ∅𝑇 𝑋𝑚 𝐡𝑚• Representer theorem: 𝐡𝑚 = ∅ 𝑋𝑚 𝐚𝑚

• Optimization problem

∅ 𝑋𝑚 = ∅ 𝐱𝑚1 , ∅ 𝐱𝑚2 , ⋯ , ∅ 𝐱𝑚𝑁

argmax𝐚𝑚

𝜌 = 𝒦12⋯𝑀 ഥ×1 𝐚1𝑇 ഥ×2 𝐚2

𝑇⋯ ഥ×𝑀 𝐚𝑀𝑇 ,

s. t. 𝐚𝑚𝑇 𝐾𝑚𝑚

2 + 𝜖𝐾𝑚𝑚 𝐚𝑚 = 1,𝑚 = 1,⋯ ,𝑀

𝐿𝑚𝑇 𝐿𝑚

Page 20: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

KTCCA solution

• Reformulation

• Let 𝒮 = 𝒦12⋯𝑚 ×1 ሚ𝐶11Τ1 2 ×2

ሚ𝐶22Τ1 2⋯×𝑀

ሚ𝐶𝑀𝑀Τ1 2, and 𝐛𝑚 =

ሚ𝐶𝑚𝑚Τ1 2𝐚𝑚:

• Solved by ALS

• Projected data:

argmax𝐮𝑚

𝜌 = 𝒮 ഥ×1 𝐛1𝑇 ഥ×2 𝐛2

𝑇⋯ ഥ×𝑀 𝐛𝑀𝑇 ,

s. t. 𝐛𝑚𝑇 𝐛𝑚 = 1,𝑚 = 1,⋯ ,𝑀

𝑍𝑚 = 𝐾𝑚𝑚𝐿𝑚−1𝐵𝑚, 𝑚 = 1,⋯ ,𝑀

Page 21: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Experimental setup

• Datasets• SecStr: biometric structure prediction

• 84K instances, 100 as labeled, additional 1200K unlabeled• 3 views: attributes based on left, middle, right context generated

from the sequence window of amino acid. Each view is 105-D• Advertisement classification

• 3279 instances, 100 as labeled• 3 views: features based on the terms in the images (588-D), terms in

the current URL (495-D), and terms in the anchor URL (472-D)• Web image annotation

• 11189 images, {4, 6, 8} labeled instances for each of 10 concepts• 3 views: 500-D SIFT visual words, 144-D color, 128-D wavelet

• Classifier: RLS and KNN• Evaluation criterion: Prediction/classification/annotation

accuracy

Page 22: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Experimental setup

• Compared methods• BSF: best single view feature• CAT: concatenation of the normalized features• FRAC: a recent multi-view feature selection algorithm• CCA: Τ𝑚 𝑚 − 1 2 subsets of two views

• CCA (BST): the best subset• CCA (AVG): the average performance of all subsets

• CCA-LS: traditional generalizations of CCA to several views

• DSE: a popular unsupervised multi-view DR method• SSMVD: a recent unsupervised multi-view DR method• TCCA: the proposed method

Page 23: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Experimental results and analysis

• Biometric structure prediction

• Learn common subspace > CAT > BSF• SSMVD, CCA-LS are comparable, so are DSE, CCA (BST)• TCCA is the best at most dims, and does not decrease

significantly when dim is high

Unlabeled = 84K Unlabeled = 1.3M

Page 24: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Experimental results and analysis

• Web image annotation

• DSE comparable to CCA (BST), CCA (AVG)

• TCCA > SSMVD, and is better than the other CCA based methods

• Non-linear > linear

Linear

Non-linear

Page 25: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Conclusions and discussion

• Conclusions• Finding a common subspace for all views using the CCA-

based strategy is often better than simply concatenating all the features, especially when the feature dimension is high

• Examining more statistics, which may require more unlabeled data to be utilized, often leads to better performance; By exploring the high-order statistics, the proposed TCCA outperforms the other methods

• Discussion• Can the common subspace be used for knowledge

transfer between different views?

Page 26: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Distance metric learning (DML)

• Goal: learn appropriate distance function over input space to reflect relationships between data

• Useful in many ML algorithms, e.g., clustering, classification and information retrieval

• Most common DML scheme: Mahalanobis metric learning >>> learning linear transformation

• Non-linear and local DML are able to capture complex structure in the data

𝑑𝐴 𝐱𝑖 , 𝐱𝑗 = 𝐱𝑖 − 𝐱𝑗𝑇𝐴 𝐱𝑖 − 𝐱𝑗 ,

𝑑𝐴 𝐱𝑖 , 𝐱𝑗 = 𝑈𝐱𝑖 − 𝑈𝐱𝑗 2

2 𝐴 = 𝑈𝑈𝑇

Page 27: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Transfer DML (TDML)

• Motivation• Needs large amount of side information to learn robust

distance metric

• The training samples are insufficient in the task/domain of interest (target task/domain)

• We have abundant labeled data in certain related, but different tasks/domains (source tasks/domains)

• Goal• Utilize the metrics obtained from source tasks/domains

to help metric learning in the target tasks/domains

Page 28: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Homogeneous TDML (HoTDML)

• Data of source domain and target domain drawn from different distributions (same feature space)

• Examples [Pan and Yang, 2009]• Web document classification: university website -> new

website• Indoor WiFi localization: WiFi signal-strength values

change in different time periods or on different devices• Sentiment classification: distribution of reviews among

different types of products can be very different

• Challenge• How to utilize the source information appropriately

given the different distributions or find a subspace in which the distribution difference is reduced

S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE TKDE, 2010.

Page 29: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Heterogeneous TDML (HeTDML)

• Data of source domain and target domain lie in different feature spaces, and may have different semantics

• Examples• Multi-lingual document classification

• multi-view classification or retrieval, etc.

• Challenge• How to find correspondences or common representations for

the different domains

Labeled reviews in English (abundant)

Labeled reviews in Spanish (scarce)

Classify reviews in Spanish

Page 30: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

HeTDML existing solutions

• Heterogeneous transfer learning (HTL) approaches usually transform heterogeneous features into a common subspace, and the transformation can be used to derive a metric

• Groups• Heterogeneous domain adaptation (HDA)

• Improve the performance in target domain

• Most only handle two domains

• Heterogeneous multi-task learning (HMTL)• Improve the performance of all domains simultaneously

Page 31: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Heterogeneous multi-task metric learning (HMTML)• Limitations of existing HMTL approaches

• Do not optimize w.r.t. the metric• Mainly focus on utilizing the side information• Can only explore the pairwise relationships between

different domains, the high-order statistics that can only be obtained by simultaneously examining all domains is ignored

• Our method• Handle arbitrary number of domains, and directly

optimize w.r.t. the metrics• Make use of large amounts of unlabeled data to build

domain connections• Explore high-order statistics between all domains

Page 32: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

HMTML framework

… …

… …

…𝑈1

…𝑈𝑀

𝐴1∗ = 𝑈1

∗𝑈1∗𝑇 𝐴𝑀

∗ = 𝑈𝑀∗ 𝑈𝑀

∗ 𝑇

… …

𝑥11𝑈

𝑋1𝑈

𝑥12𝑈𝑥1𝑁𝑈

Tensor based correlation maximization

… …𝑧11𝑈𝑧12𝑈𝑧1𝑁𝑈

𝑍1𝑈

𝑧𝑀1𝑈𝑧𝑀2𝑈𝑧𝑀𝑁𝑈

𝑍𝑀𝑈

𝑋𝑀𝑈

𝑥𝑀2𝑈

𝑥𝑀1𝑈

𝑥𝑀𝑁𝑈

Unlabeled data

𝒟𝑈

Germany

𝐴𝑀 = 𝑈𝑀𝑈𝑀𝑇

𝒟𝑀𝐿

English

𝐴1 = 𝑈1𝑈1𝑇

𝒟1𝐿

Page 33: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

HMTML formulation

• Optimization problem• General formulation

• Ψ 𝐴𝑚 =2

𝑁𝑚 𝑁𝑚−1σ𝑖<𝑗 𝐿 𝐴𝑚; 𝐱𝑚𝑖 , 𝐱𝑚𝑗 , 𝑦𝑚𝑖𝑗 is the

empirical loss w.r.t. 𝐴𝑚• 𝑅 𝐴1, 𝐴2, ⋯ , 𝐴𝑀 enforces information transfer across

different domains

argmin𝐴𝑚 𝑚=1

𝑀𝐹 𝐴𝑚 =

𝑚=1

𝑀

Ψ 𝐴𝑚 + 𝛾𝑅 𝐴1, 𝐴2, ⋯ , 𝐴𝑀 ,

s.t. 𝐴𝑚 ≽ 0,𝑚 = 1, 2,⋯ ,𝑀,

Page 34: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Knowledge transfer by high-order correlation maximization• Main idea

• Decompose 𝐴𝑚 as 𝐴𝑚 = 𝑈𝑚𝑈𝑚𝑇

• Use 𝑈𝑚 to project the unlabeled data points of different domains into a common subspace, where the correlation of all domains are maximized

• Formulation

• corr 𝐳1𝑛𝑈 , 𝐳2𝑛

𝑈 , ⋯ , 𝐳𝑀𝑛𝑈 = 𝐳1𝑛

𝑈 ⨀𝐳2𝑛𝑈 ⨀⋯⨀𝐳𝑀𝑛

𝑈 𝑇𝐞 is

the correlation of the projected representations𝐳𝑚𝑛𝑈 = 𝑈𝑚

𝑇 𝐱𝑚𝑛𝑈

𝑚=1𝑀

argmax𝑈𝑚 𝑚=1

𝑀

1

𝑁𝑈

𝑛=1

𝑁𝑈

corr 𝐳1𝑛𝑈 , 𝐳2𝑛

𝑈 , ⋯ , 𝐳𝑀𝑛𝑈 ,

Page 35: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Knowledge transfer by high-order correlation maximization• Reformulation

argmax𝑈𝑚 𝑚=1

𝑀

1

𝑁𝑈

𝑛=1

𝑁𝑈

corr 𝐳1𝑛𝑈 , 𝐳2𝑛

𝑈 , ⋯ , 𝐳𝑀𝑛𝑈

argmax𝑈𝑚 𝑚=1

𝑀

1

𝑁𝑈

𝑛=1

𝑁𝑈

𝒢 ഥ×1 𝐱1𝑛𝑈 𝑇⋯ ഥ×𝑀 𝐱𝑀𝑛

𝑈 𝑇

argmin𝑈𝑚 𝑚=1

𝑀

1

𝑁𝑈

𝑛=1

𝑁𝑈

𝒞𝑛𝑈 − 𝒢 𝐹

2

[Luo et al., 2015]

𝒢 = ℰ𝑟 ×1 𝑈1 ×2 𝑈2⋯×𝑀 𝑈𝑀is the covariance tensor of the mappings;

𝒞𝑛𝑈 is the covariance tensor of

the representations for the 𝑛’th unlabeled sample.

[Lathauwer et al., 2000b]

Y. Luo et al., “Tensor Canonical Correlation Analysis for Multi-view Dimension Reduction,” IEEE TKDE, 2015.L. De Lathauwer et al., “A multilinear singular value decomposition,” JMAA, 2000.

Page 36: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

HMTML formulation

• Specific optimization problem

• Corresponds to find a subspace where the representations of all domains are close to each other

• Knowledge is transferred in this subspace, and so different domains can help each other in learning the mapping 𝑈𝑚, or equivalently the metric 𝐴𝑚

argmin𝑈𝑚 𝑚=1

𝑀𝐹 𝑈𝑚 =

𝑚=1

𝑀1

𝑁𝑚′

𝑘=1

𝑁𝑚′

𝑔 𝑦𝑚𝑘 1 − 𝛅𝑚𝑘𝑇 𝑈𝑚𝑈𝑚

𝑇 𝛅𝑚𝑘

+𝛾

𝑁𝑈

𝑛=1

𝑁𝑈

𝒞𝑛𝑈 − 𝒢 𝐹

2 +

𝑚=1

𝑀

𝛾𝑚 𝑈𝑚 1

Page 37: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

HMTML solution

• Rewrite 𝒞𝑛𝑈 − 𝒢 𝐹

2 to an expression w.r.t. 𝑈𝑚

• Alternating for each 𝑈𝑚 and solve each subproblemw.r.t. 𝑈𝑚 by projected gradient descent

𝒢 = ℰ𝑟 ×1 𝑈1 ×2 𝑈2⋯×𝑀 𝑈𝑀 = ℬ ×𝑚 𝑈𝑚,

Metricizing property

𝐺 𝑚 = 𝑈𝑚𝐵 𝑚

𝒞𝑛𝑈 − 𝒢 𝐹

2 = 𝐶𝑛 𝑚𝑈 − 𝐺 𝑚 𝐹

2

𝒞𝑛𝑈 − 𝒢 𝐹

2 = 𝐶𝑛 𝑚𝑈 − 𝑈𝑚𝐵 𝑚 𝐹

2

[Lathauwer et al., 2000a]

ℬ = ℰ𝑟 ×1 𝑈1⋯×𝑚−1 𝑈𝑚−1 ×𝑚+1 𝑈𝑚+1⋯×𝑀 𝑈𝑀

L. De Lathauwer et al., “On the best Rank-1 and rank-(r1, r2, ..., rn) approximation of higher-order tensors,” SIAM J. Matrix Anal. Appl., 2000.

Page 38: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Experiments

• Datasets and features• Reuters multilingual collection (RMLC)

• 6 categories, 3 domains: English (EN), Italian (IT), Spanish (SP)• Number of Documents: EN=18758, IT=24039, SP=12342• TF-IDF features, PCA preprocess to find comparable and high-

level patterns for transfer

• NUS WIDE• Subset of 12 animal concepts, 16519 images + tags• {SIFT, wavelet, tag} + PCA preprocess, each representation is a

domain

• Evaluation criteria• Accuracy, MacroF1

Page 39: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Experiments

• Compared methods• EU: Euclidean distance between samples based on their

original feature representations

• RDML: an efficient and competitive DML algorithm, does not make use of any additional information from other domains

• DAMA: constructing mappings 𝑈𝑚 to link multiple heterogeneous domains using manifold alignment

• MTDA: the multi-task extension of linear discriminative analysis

• HMTML: the proposed method

Page 40: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Experiments

• Average perf. of all domains w.r.t. common factors

• Although the labeled samples in each domain is scarce, learning the distance metric separately using RDML can still improve the performance significantly

Page 41: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Experiments

• Average perf. of all domains w.r.t. common factors

• All the three heterogeneous transfer learning approaches achieve much better performance than RDML. This indicates that it is useful to leverage information from other domains in DML

Page 42: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Experiments

• Average perf. of all domains w.r.t. common factors

• HMTML outperforms both DAMA and MTDA at most numbers (of common factors). This indicates that the learned factors by our method are more expressive than the other approaches

Page 43: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Experiments

• Performance for individual domains

• RDML improves the performance in each domain, and the improvements are similar for different domains, since there is no communication between them

Page 44: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Experiments

• Performance for individual domains

• Transfer learning methods has much larger improvements than RDML in the domains that the discriminative ability of the original representations is not very good. This demonstrates that the knowledge is successfully transferred between different domains

Page 45: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Experiments

• Performance for individual domains

• The discriminative domain obtains little benefit from the other relatively non-discriminative domains in DAMA and MTDA, while in the proposed HMTML, we still achieve significant improvements

Page 46: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Conclusions

• The labeled data deficiency problem can be alleviated by learning metrics for multiple heterogeneous domains simultaneously

• The shared knowledge of different domains exploited by the transfer learning methods can benefit each domain if appropriate common factors are discovered, and the high-order statistics (correlation information) is critical in discovering such factors

Page 47: Tensor Canonical Correlation Analysis and Its applicationsvalser.org/webinar/slide/slides/20160810/TCCA and... · Tensor Canonical Correlation Analysis and Its applications Presenter:

Thank You!Q & A