83
Shared Space Component Analysis Shared Space Component Analysis Mihalis A. Nicolaou Deparment of Computing, Imperial College London, UK iBug Talk [email protected] February 2015 Shared Space Component Analysis 1 / 28

Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Shared Space Component Analysis

Mihalis A. Nicolaou

Deparment of Computing,Imperial College London, UK

iBug Talk

[email protected]

February 2015

Shared Space Component Analysis 1 / 28

Page 2: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Intro

Component Analysis (CA): Collection of statistical methods aiming at thefactorisation of a signal into components pertaining to a particular task(e.g., predictive analysis, clustering), usually of reduced dimensionality.

• CA utilised in the vast majority of ML and CV systems (e.g., PCA).

• Closely related to the process of dimensionality reduction,

specifically wrt. feature extraction.

Shared Space Component Analysis 2 / 28

Page 3: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Intro

Component Analysis (CA): Collection of statistical methods aiming at thefactorisation of a signal into components pertaining to a particular task(e.g., predictive analysis, clustering), usually of reduced dimensionality.

• CA utilised in the vast majority of ML and CV systems (e.g., PCA).

• Closely related to the process of dimensionality reduction,

specifically wrt. feature extraction.

Shared Space Component Analysis 2 / 28

Page 4: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Roots of CA lie in Principal CA (PCA)

On lines and planes of closest fit to systems of points in space (Pearson, 1901)

Figure from Pearson’s paper points out

that the line of best fit is the direction

with maximum variance. Also note, line

of worst fit is orthogonal to the line of

best fit.

Shared Space Component Analysis 3 / 28

Page 5: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Roots of CA lie in Principal CA (PCA)

On lines and planes of closest fit to systems of points in space (Pearson, 1901)

Figure from Pearson’s paper points out

that the line of best fit is the direction

with maximum variance. Also note, line

of worst fit is orthogonal to the line of

best fit.

Shared Space Component Analysis 3 / 28

Page 6: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Roots of CA lie in Principal CA (PCA)

• PCA independently developed in 1933 by Hotelling, coining theterm “Principal Components”.

• In PCA and linear CA in general, we aim to learn a lineartransformation Y = WTX, where X is the observation matrix.

• Y is the latent, unobserved space which satisfies certain properties.In case of PCA, Y needs to maximally preserve variance (Hotelling),or equivalently, minimise the reconstruction error (Pearson).

• Similarly to PCA, many other well known CA techniques deal withjust one set of observations.

• In order to define a notion of a “shared space” though, we needmore than one set of observations.

Shared Space Component Analysis 4 / 28

Page 7: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Roots of CA lie in Principal CA (PCA)

• PCA independently developed in 1933 by Hotelling, coining theterm “Principal Components”.

• In PCA and linear CA in general, we aim to learn a lineartransformation Y = WTX, where X is the observation matrix.

• Y is the latent, unobserved space which satisfies certain properties.In case of PCA, Y needs to maximally preserve variance (Hotelling),or equivalently, minimise the reconstruction error (Pearson).

• Similarly to PCA, many other well known CA techniques deal withjust one set of observations.

• In order to define a notion of a “shared space” though, we needmore than one set of observations.

Shared Space Component Analysis 4 / 28

Page 8: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Roots of CA lie in Principal CA (PCA)

• PCA independently developed in 1933 by Hotelling, coining theterm “Principal Components”.

• In PCA and linear CA in general, we aim to learn a lineartransformation Y = WTX, where X is the observation matrix.

• Y is the latent, unobserved space which satisfies certain properties.In case of PCA, Y needs to maximally preserve variance (Hotelling),or equivalently, minimise the reconstruction error (Pearson).

• Similarly to PCA, many other well known CA techniques deal withjust one set of observations.

• In order to define a notion of a “shared space” though, we needmore than one set of observations.

Shared Space Component Analysis 4 / 28

Page 9: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Roots of CA lie in Principal CA (PCA)

• PCA independently developed in 1933 by Hotelling, coining theterm “Principal Components”.

• In PCA and linear CA in general, we aim to learn a lineartransformation Y = WTX, where X is the observation matrix.

• Y is the latent, unobserved space which satisfies certain properties.In case of PCA, Y needs to maximally preserve variance (Hotelling),or equivalently, minimise the reconstruction error (Pearson).

• Similarly to PCA, many other well known CA techniques deal withjust one set of observations.

• In order to define a notion of a “shared space” though, we needmore than one set of observations.

Shared Space Component Analysis 4 / 28

Page 10: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Roots of CA lie in Principal CA (PCA)

• PCA independently developed in 1933 by Hotelling, coining theterm “Principal Components”.

• In PCA and linear CA in general, we aim to learn a lineartransformation Y = WTX, where X is the observation matrix.

• Y is the latent, unobserved space which satisfies certain properties.In case of PCA, Y needs to maximally preserve variance (Hotelling),or equivalently, minimise the reconstruction error (Pearson).

• Similarly to PCA, many other well known CA techniques deal withjust one set of observations.

• In order to define a notion of a “shared space” though, we needmore than one set of observations.

Shared Space Component Analysis 4 / 28

Page 11: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Canonical Correlation Analysis

• Most well known method belonging in the so-called Shared-SpaceComponent Analysis family is Canonical Correlation Analysis (CCA).

• Introduced by Hotelling in 1936 (Relations Between Two Sets ofVariates, Biometrika), three years after he proposed PCA.

• CCA is actually a natural extension of PCA to two datasets, onlyinstead of maximally preserving variance, we maximise thecorrelation (covariance) between the datasets.

PCA→ WTΣXXW = WTXXTW, where Y = WTX

CCA→ WT1 ΣX1X2 W2 = WT

1 X1X2W2, where Yi = WTi Xi

where Cov(X,X) = Var(X)

Shared Space Component Analysis 5 / 28

Page 12: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Canonical Correlation Analysis

• Most well known method belonging in the so-called Shared-SpaceComponent Analysis family is Canonical Correlation Analysis (CCA).

• Introduced by Hotelling in 1936 (Relations Between Two Sets ofVariates, Biometrika), three years after he proposed PCA.

• CCA is actually a natural extension of PCA to two datasets, onlyinstead of maximally preserving variance, we maximise thecorrelation (covariance) between the datasets.

PCA→ WTΣXXW = WTXXTW, where Y = WTX

CCA→ WT1 ΣX1X2 W2 = WT

1 X1X2W2, where Yi = WTi Xi

where Cov(X,X) = Var(X)

Shared Space Component Analysis 5 / 28

Page 13: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Canonical Correlation Analysis

• Most well known method belonging in the so-called Shared-SpaceComponent Analysis family is Canonical Correlation Analysis (CCA).

• Introduced by Hotelling in 1936 (Relations Between Two Sets ofVariates, Biometrika), three years after he proposed PCA.

• CCA is actually a natural extension of PCA to two datasets, onlyinstead of maximally preserving variance, we maximise thecorrelation (covariance) between the datasets.

PCA→ WTΣXXW = WTXXTW, where Y = WTX

CCA→ WT1 ΣX1X2 W2 = WT

1 X1X2W2, where Yi = WTi Xi

where Cov(X,X) = Var(X)

Shared Space Component Analysis 5 / 28

Page 14: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

CCA is a natural extension of PCA (More Formally)

CCA: Given observations X1 ∈ <F1xT , X2 ∈ <F2xT infer maximally

correlated latent spaces Y1 ∈ <NxT , Y2 ∈ <NxT .

• Assuming linear projections, Yi = WTi Xi , we can maximise the

standard, pearson correlation coefficient, i.e.:

Cov(Y1,Y2)

σY1σY2

=E[Y1Y2]

E[Y21]E[Y2

2]=

WT1 ΣX1X2 W2√

WT1 ΣX1X1 W1

√WT

2 ΣX2X2 W2

• Due to scale invariance of correlation wrt. loadings, we have

max WT1 ΣX1X2 W2, s.t. WT

i ΣXiXi Wi = I, i = {1, 2}.

• An equivalent least-squares formulation,

min ||WT1 X1 −WT

2 X2||2F , s.t. WTi ΣXiXi Wi = I, i = {1, 2}.

Shared Space Component Analysis 6 / 28

Page 15: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

CCA is a natural extension of PCA (More Formally)

CCA: Given observations X1 ∈ <F1xT , X2 ∈ <F2xT infer maximally

correlated latent spaces Y1 ∈ <NxT , Y2 ∈ <NxT .

• Assuming linear projections, Yi = WTi Xi , we can maximise the

standard, pearson correlation coefficient, i.e.:

Cov(Y1,Y2)

σY1σY2

=E[Y1Y2]

E[Y21]E[Y2

2]=

WT1 ΣX1X2 W2√

WT1 ΣX1X1 W1

√WT

2 ΣX2X2 W2

• Due to scale invariance of correlation wrt. loadings, we have

max WT1 ΣX1X2 W2, s.t. WT

i ΣXiXi Wi = I, i = {1, 2}.

• An equivalent least-squares formulation,

min ||WT1 X1 −WT

2 X2||2F , s.t. WTi ΣXiXi Wi = I, i = {1, 2}.

Shared Space Component Analysis 6 / 28

Page 16: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

CCA is a natural extension of PCA (More Formally)

CCA: Given observations X1 ∈ <F1xT , X2 ∈ <F2xT infer maximally

correlated latent spaces Y1 ∈ <NxT , Y2 ∈ <NxT .

• Assuming linear projections, Yi = WTi Xi , we can maximise the

standard, pearson correlation coefficient, i.e.:

Cov(Y1,Y2)

σY1σY2

=E[Y1Y2]

E[Y21]E[Y2

2]=

WT1 ΣX1X2 W2√

WT1 ΣX1X1 W1

√WT

2 ΣX2X2 W2

• Due to scale invariance of correlation wrt. loadings, we have

max WT1 ΣX1X2 W2, s.t. WT

i ΣXiXi Wi = I, i = {1, 2}.

• An equivalent least-squares formulation,

min ||WT1 X1 −WT

2 X2||2F , s.t. WTi ΣXiXi Wi = I, i = {1, 2}.

Shared Space Component Analysis 6 / 28

Page 17: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

CCA is a natural extension of PCA (More Formally)

CCA: Given observations X1 ∈ <F1xT , X2 ∈ <F2xT infer maximally

correlated latent spaces Y1 ∈ <NxT , Y2 ∈ <NxT .

• Assuming linear projections, Yi = WTi Xi , we can maximise the

standard, pearson correlation coefficient, i.e.:

Cov(Y1,Y2)

σY1σY2

=E[Y1Y2]

E[Y21]E[Y2

2]=

WT1 ΣX1X2 W2√

WT1 ΣX1X1 W1

√WT

2 ΣX2X2 W2

• Due to scale invariance of correlation wrt. loadings, we have

max WT1 ΣX1X2 W2, s.t. WT

i ΣXiXi Wi = I, i = {1, 2}.

• An equivalent least-squares formulation,

min ||WT1 X1 −WT

2 X2||2F , s.t. WTi ΣXiXi Wi = I, i = {1, 2}.

Shared Space Component Analysis 6 / 28

Page 18: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Time Warping is closely related to CA

Dynamic Time Warping

Given X1 ∈ RD×T1 , X2 ∈ RD×T2 (equal D required)

arg min∆

1,∆2

||X1∆1 − X2∆2||2F , s.t. . . .

where ∆1 ∈ {0, 1}T1×T∆ ,∆2 ∈ {0, 1}T2×T∆ are binary selection matrices,effectively re-mapping the samples in X1, X2. Optimal path inferred bydynamic programming in O(TxTy ).

Least-Squares CCA

Given X1 ∈ RD1×T , X2 ∈ RD2×T (equal T required),

arg minW

1,W2

||WT1 X1 −WT

2 X2||2F , s.t. . . .

Canonical Time Warping (CTW)

Given X1 ∈ RD1×T1 , X2 ∈ RD2×T2 ,

arg minW

1,W2,∆1

,∆2

||WT1 X1∆1 −WT

2 X2∆2||2F , s.t. . . .Shared Space Component Analysis 7 / 28

Page 19: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Time Warping is closely related to CA

Dynamic Time Warping

Given X1 ∈ RD×T1 , X2 ∈ RD×T2 (equal D required)

arg min∆

1,∆2

||X1∆1 − X2∆2||2F , s.t. . . .

Least-Squares CCA

Given X1 ∈ RD1×T , X2 ∈ RD2×T (equal T required),

arg minW

1,W2

||WT1 X1 −WT

2 X2||2F , s.t. . . .

Canonical Time Warping (CTW)

Given X1 ∈ RD1×T1 , X2 ∈ RD2×T2 ,

arg minW

1,W2,∆1

,∆2

||WT1 X1∆1 −WT

2 X2∆2||2F , s.t. . . .

Shared Space Component Analysis 7 / 28

Page 20: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Time Warping is closely related to CA

Dynamic Time Warping

Given X1 ∈ RD×T1 , X2 ∈ RD×T2 (equal D required)

arg min∆

1,∆2

||X1∆1 − X2∆2||2F , s.t. . . .

Least-Squares CCA

Given X1 ∈ RD1×T , X2 ∈ RD2×T (equal T required),

arg minW

1,W2

||WT1 X1 −WT

2 X2||2F , s.t. . . .

Canonical Time Warping (CTW) - Zhou and Torre, 2009

Given X1 ∈ RD1×T1 , X2 ∈ RD2×T2 (D1 may be 6= D2, T1 may be 6= T2),

arg minW

1,W2,∆1

,∆2

||WT1 X1∆1 −WT

2 X2∆2||2F , s.t. . . .

Shared Space Component Analysis 7 / 28

Page 21: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Time Warping is closely related to CA

Dynamic Time Warping

Given X1 ∈ RD×T1 , X2 ∈ RD×T2 (equal D required)

arg min∆

1,∆2

||X1∆1 − X2∆2||2F , s.t. . . .

Least-Squares CCA

Given X1 ∈ RD1×T , X2 ∈ RD2×T (equal T required),

arg minW

1,W2

||WT1 X1 −WT

2 X2||2F , s.t. . . .

Canonical Time Warping (CTW) - Alt. Opt. Step 1

Given X1 ∈ RD1×T1 , X2 ∈ RD2×T2 ,

arg minW

1,W2,∆1

,∆2

||WT1 X1∆1 −WT

2 X2∆2||2F , s.t. . . .

Shared Space Component Analysis 7 / 28

Page 22: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Time Warping is closely related to CA

Dynamic Time Warping

Given X1 ∈ RD×T1 , X2 ∈ RD×T2 (equal D required)

arg min∆

1,∆2

||X1∆1 − X2∆2||2F , s.t. . . .

Least-Squares CCA

Given X1 ∈ RD1×T , X2 ∈ RD2×T (equal T required),

arg minW

1,W2

||WT1 X1 −WT

2 X2||2F , s.t. . . .

Canonical Time Warping (CTW) - Alt. Opt. Step 2

Given X1 ∈ RD1×T1 , X2 ∈ RD2×T2 ,

arg minW

1,W2,∆1

,∆2

||WT1 X1∆1 −WT

2 X2∆2||2F , s.t. . . .

Shared Space Component Analysis 7 / 28

Page 23: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

So far...

• So far we have talked about PCA, and how CCA is a shared spacecomponent analysis method which can be considered as ageneralisation of PCA to multiple datasets.

• We have also seen how the least-squares CCA can be elegantlycombined with Dynamic Time Warping (DTW).

• We will now look into probabilistic interpretations of CCA which areof particular interest.

Shared Space Component Analysis 8 / 28

Page 24: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

So far...

• So far we have talked about PCA, and how CCA is a shared spacecomponent analysis method which can be considered as ageneralisation of PCA to multiple datasets.

• We have also seen how the least-squares CCA can be elegantlycombined with Dynamic Time Warping (DTW).

• We will now look into probabilistic interpretations of CCA which areof particular interest.

Shared Space Component Analysis 8 / 28

Page 25: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

So far...

• So far we have talked about PCA, and how CCA is a shared spacecomponent analysis method which can be considered as ageneralisation of PCA to multiple datasets.

• We have also seen how the least-squares CCA can be elegantlycombined with Dynamic Time Warping (DTW).

• We will now look into probabilistic interpretations of CCA which areof particular interest.

Shared Space Component Analysis 8 / 28

Page 26: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

A Probabilistic Interpretation of CCA

Given datasets X1 ∈ RD1×T ,X2 ∈ RD2×T , the gen. model is defined as:

Xi = {xi ,1, . . . , xi ,T}x1,n = W1zn + ε1

x2,n = W2zn + ε2

εi ∼ N (0, σi I), zn ∼ N (0, I)

x1,n x2,n

zn

W1 W2

• The Maximum Likelihood (ML) parameter estimates of this modelhave been shown to be equivalent to deterministic CCA (Bach andJordan, 2005).

• Note that in this generative model, the “shared-space” Z is nowdefined explicitly.

This is actually a special case of a model introduced in 1958, the

Inter-battery Factor Analysis (IBFA).

Shared Space Component Analysis 9 / 28

Page 27: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

A Probabilistic Interpretation of CCA

Given datasets X1 ∈ RD1×T ,X2 ∈ RD2×T , the gen. model is defined as:

Xi = {xi ,1, . . . , xi ,T}x1,n = W1zn + ε1

x2,n = W2zn + ε2

εi ∼ N (0, σi I), zn ∼ N (0, I)

x1,n x2,n

zn

W1 W2

• The Maximum Likelihood (ML) parameter estimates of this modelhave been shown to be equivalent to deterministic CCA (Bach andJordan, 2005).

• Note that in this generative model, the “shared-space” Z is nowdefined explicitly.

This is actually a special case of a model introduced in 1958, the

Inter-battery Factor Analysis (IBFA).

Shared Space Component Analysis 9 / 28

Page 28: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

A Probabilistic Interpretation of CCA

Given datasets X1 ∈ RD1×T ,X2 ∈ RD2×T , the gen. model is defined as:

Xi = {xi ,1, . . . , xi ,T}x1,n = W1zn + ε1

x2,n = W2zn + ε2

εi ∼ N (0, σi I), zn ∼ N (0, I)

x1,n x2,n

zn

W1 W2

• The Maximum Likelihood (ML) parameter estimates of this modelhave been shown to be equivalent to deterministic CCA (Bach andJordan, 2005).

• Note that in this generative model, the “shared-space” Z is nowdefined explicitly.

This is actually a special case of a model introduced in 1958, the

Inter-battery Factor Analysis (IBFA).

Shared Space Component Analysis 9 / 28

Page 29: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

A Probabilistic Interpretation of CCA

Given datasets X1 ∈ RD1×T ,X2 ∈ RD2×T , the gen. model is defined as:

Xi = {xi ,1, . . . , xi ,T}x1,n = W1zn + ε1

x2,n = W2zn + ε2

εi ∼ N (0, σi I), zn ∼ N (0, I)

x1,n x2,n

zn

W1 W2

• The Maximum Likelihood (ML) parameter estimates of this modelhave been shown to be equivalent to deterministic CCA (Bach andJordan, 2005).

• Note that in this generative model, the “shared-space” Z is nowdefined explicitly.

This is actually a special case of a model introduced in 1958, the

Inter-battery Factor Analysis (IBFA).

Shared Space Component Analysis 9 / 28

Page 30: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Inter-Battery Factor Analysis (IBFA)

Battery (tests) refers to a series of psychological, behaviour or cognitiveassessment tests. This term was often used in statistics since data frommultiple batteries were essentially the one of the first datasets whichconsisted of multiple modalities.

• Many seminal works have been published in journals such asPsychometrika (devoted to the advancement of theory andmethodology for behavioural data).

• A prominent example lies in Tucker’s Inter-Battery Factor Analysis(IBFA), (1958, Psychometrika).

• CCA was introduced out of the increasing need for analysingmultiple sets of data. It is considered the first “shared-space” model.

• Similarly, IBFA is the first model which extends shared-space modelsto the private-shared space paradigm.

Shared Space Component Analysis 10 / 28

Page 31: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Inter-Battery Factor Analysis (IBFA)

Battery (tests) refers to a series of psychological, behaviour or cognitiveassessment tests. This term was often used in statistics since data frommultiple batteries were essentially the one of the first datasets whichconsisted of multiple modalities.

• Many seminal works have been published in journals such asPsychometrika (devoted to the advancement of theory andmethodology for behavioural data).

• A prominent example lies in Tucker’s Inter-Battery Factor Analysis(IBFA), (1958, Psychometrika).

• CCA was introduced out of the increasing need for analysingmultiple sets of data. It is considered the first “shared-space” model.

• Similarly, IBFA is the first model which extends shared-space modelsto the private-shared space paradigm.

Shared Space Component Analysis 10 / 28

Page 32: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Inter-Battery Factor Analysis (IBFA)

Battery (tests) refers to a series of psychological, behaviour or cognitiveassessment tests. This term was often used in statistics since data frommultiple batteries were essentially the one of the first datasets whichconsisted of multiple modalities.

• Many seminal works have been published in journals such asPsychometrika (devoted to the advancement of theory andmethodology for behavioural data).

• A prominent example lies in Tucker’s Inter-Battery Factor Analysis(IBFA), (1958, Psychometrika).

• CCA was introduced out of the increasing need for analysingmultiple sets of data. It is considered the first “shared-space” model.

• Similarly, IBFA is the first model which extends shared-space modelsto the private-shared space paradigm.

Shared Space Component Analysis 10 / 28

Page 33: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Inter-Battery Factor Analysis (IBFA)

Battery (tests) refers to a series of psychological, behaviour or cognitiveassessment tests. This term was often used in statistics since data frommultiple batteries were essentially the one of the first datasets whichconsisted of multiple modalities.

• Many seminal works have been published in journals such asPsychometrika (devoted to the advancement of theory andmethodology for behavioural data).

• A prominent example lies in Tucker’s Inter-Battery Factor Analysis(IBFA), (1958, Psychometrika).

• CCA was introduced out of the increasing need for analysingmultiple sets of data. It is considered the first “shared-space” model.

• Similarly, IBFA is the first model which extends shared-space modelsto the private-shared space paradigm.

Shared Space Component Analysis 10 / 28

Page 34: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Inter-Battery Factor Analysis (IBFA)

Battery (tests) refers to a series of psychological, behaviour or cognitiveassessment tests. This term was often used in statistics since data frommultiple batteries were essentially the one of the first datasets whichconsisted of multiple modalities.

• Many seminal works have been published in journals such asPsychometrika (devoted to the advancement of theory andmethodology for behavioural data).

• A prominent example lies in Tucker’s Inter-Battery Factor Analysis(IBFA), (1958, Psychometrika).

• CCA was introduced out of the increasing need for analysingmultiple sets of data. It is considered the first “shared-space” model.

• Similarly, IBFA is the first model which extends shared-space modelsto the private-shared space paradigm.

Shared Space Component Analysis 10 / 28

Page 35: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Private-Shared Space Models

Given datasets X1 ∈ RD1×T ,X2 ∈ RD2×T , the gen. model is defined as:

x1,n = W1zn + B1z1,n + ε1

x2,n = W2zn + B2z2,n + ε2

zn, z1, z2 ∼ N (0, I)

x1,n x2,n

z2,nz1,n zn

B1 B2

W1 W2

• Generative interpretation of CCA (Bach and Jordan, 2005) isessentially equivalent to a special case of the probabilisticinterpretation of IBFA (Browne, 1979).

• Kaski and Klami (2008) re-introduce IBFA as Probabilistic CCA(PCCA), providing an EM algorithm. Terms have been usedinterchangeably (Kaski and Klami in JMLR, 2013).

• IBFA actually complements CCA by providing a description of thevariation not captured by the correlating components.

Shared Space Component Analysis 11 / 28

Page 36: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Private-Shared Space Models

Given datasets X1 ∈ RD1×T ,X2 ∈ RD2×T , the gen. model is defined as:

x1,n = W1zn + B1z1,n + ε1

x2,n = W2zn + B2z2,n + ε2

zn, z1, z2 ∼ N (0, I)

x1,n x2,n

z2,nz1,n zn

B1 B2

W1 W2

• Generative interpretation of CCA (Bach and Jordan, 2005) isessentially equivalent to a special case of the probabilisticinterpretation of IBFA (Browne, 1979).

• Kaski and Klami (2008) re-introduce IBFA as Probabilistic CCA(PCCA), providing an EM algorithm. Terms have been usedinterchangeably (Kaski and Klami in JMLR, 2013).

• IBFA actually complements CCA by providing a description of thevariation not captured by the correlating components.

Shared Space Component Analysis 11 / 28

Page 37: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Private-Shared Space Models

Given datasets X1 ∈ RD1×T ,X2 ∈ RD2×T , the gen. model is defined as:

x1,n = W1zn + B1z1,n + ε1

x2,n = W2zn + B2z2,n + ε2

zn, z1, z2 ∼ N (0, I)

x1,n x2,n

z2,nz1,n zn

B1 B2

W1 W2

• Generative interpretation of CCA (Bach and Jordan, 2005) isessentially equivalent to a special case of the probabilisticinterpretation of IBFA (Browne, 1979).

• Kaski and Klami (2008) re-introduce IBFA as Probabilistic CCA(PCCA), providing an EM algorithm. Terms have been usedinterchangeably (Kaski and Klami in JMLR, 2013).

• IBFA actually complements CCA by providing a description of thevariation not captured by the correlating components.

Shared Space Component Analysis 11 / 28

Page 38: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Private-Shared Space Models

Given datasets X1 ∈ RD1×T ,X2 ∈ RD2×T , the gen. model is defined as:

x1,n = W1zn + B1z1,n + ε1

x2,n = W2zn + B2z2,n + ε2

zn, z1, z2 ∼ N (0, I)

x1,n x2,n

z2,nz1,n zn

B1 B2

W1 W2

• Generative interpretation of CCA (Bach and Jordan, 2005) isessentially equivalent to a special case of the probabilisticinterpretation of IBFA (Browne, 1979).

• Kaski and Klami (2008) re-introduce IBFA as Probabilistic CCA(PCCA), providing an EM algorithm. Terms have been usedinterchangeably (Kaski and Klami in JMLR, 2013).

• IBFA actually complements CCA by providing a description of thevariation not captured by the correlating components.

Shared Space Component Analysis 11 / 28

Page 39: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

An interesting observation...

Browne (1979) clarifies.

• The generative formulation maintains a single latent variable z thatcaptures the shared variation, whereas classical CCA results in twoseparate but correlating variables obtained by projecting.

→ i.e. P(Z|X1,X2), P(Z|X1) vs. WTX1.

Based on this, Kaski and Klami (2013) note.

• “the extended model provides novel application opportunities notimmediately apparent in the more restricted CCA model”

Shared Space Component Analysis 12 / 28

Page 40: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

An interesting observation...

Browne (1979) clarifies.

• The generative formulation maintains a single latent variable z thatcaptures the shared variation, whereas classical CCA results in twoseparate but correlating variables obtained by projecting.

→ i.e. P(Z|X1,X2), P(Z|X1) vs. WTX1.

Based on this, Kaski and Klami (2013) note.

• “the extended model provides novel application opportunities notimmediately apparent in the more restricted CCA model”

Shared Space Component Analysis 12 / 28

Page 41: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

So far...

• We have talked about CCA (shared space) and IBFA (private-sharedspace).

• We referred to their probabilistic interpretations.

• We have also seen how DTW can be elegantly combined withcomponent analysis.

• In what follows, we present a case-study of a private-shared spacemodel applied to the problem of fusing multiple subjectiveannotations.

Shared Space Component Analysis 13 / 28

Page 42: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

So far...

• We have talked about CCA (shared space) and IBFA (private-sharedspace).

• We referred to their probabilistic interpretations.

• We have also seen how DTW can be elegantly combined withcomponent analysis.

• In what follows, we present a case-study of a private-shared spacemodel applied to the problem of fusing multiple subjectiveannotations.

Shared Space Component Analysis 13 / 28

Page 43: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

So far...

• We have talked about CCA (shared space) and IBFA (private-sharedspace).

• We referred to their probabilistic interpretations.

• We have also seen how DTW can be elegantly combined withcomponent analysis.

• In what follows, we present a case-study of a private-shared spacemodel applied to the problem of fusing multiple subjectiveannotations.

Shared Space Component Analysis 13 / 28

Page 44: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

So far...

• We have talked about CCA (shared space) and IBFA (private-sharedspace).

• We referred to their probabilistic interpretations.

• We have also seen how DTW can be elegantly combined withcomponent analysis.

• In what follows, we present a case-study of a private-shared spacemodel applied to the problem of fusing multiple subjectiveannotations.

Shared Space Component Analysis 13 / 28

Page 45: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Continuous Subjective Annotations

Problem: Given multiple annotations (time-series) X = {X1, . . . ,XN},infer the common, consensus annotation (i.e., the “ground truth”).

0 500 1000 1500

−1

−0.5

0

0.5

1 SPIKE NOISE

BIAS

LAGS

VAL

EN

CE

FRAMES

Challenges.

• Account for annotator-specific bias and noise

• Model the common signal shared by all annotators

• Account for temporal discrepancies amongst annot.

Shared Space Component Analysis 14 / 28

Page 46: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Continuous Subjective Annotations

Problem: Given multiple annotations (time-series) X = {X1, . . . ,XN},infer the common, consensus annotation (i.e., the “ground truth”).

0 500 1000 1500

−1

−0.5

0

0.5

1 SPIKE NOISE

BIAS

LAGS

VAL

EN

CE

FRAMES

Challenges.

• Account for annotator-specific bias and noise

• Model the common signal shared by all annotators

• Account for temporal discrepancies amongst annot.

Shared Space Component Analysis 14 / 28

Page 47: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Continuous Subjective Annotations

Problem: Given multiple annotations (time-series) X = {X1, . . . ,XN},infer the common, consensus annotation (i.e., the “ground truth”).

0 500 1000 1500

−1

−0.5

0

0.5

1 SPIKE NOISE

BIAS

LAGS

VAL

EN

CE

FRAMES

Challenges.

• Account for annotator-specific bias and noise

• Model the common signal shared by all annotators

• Account for temporal discrepancies amongst annot.

Shared Space Component Analysis 14 / 28

Page 48: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Continuous Subjective Annotations

Problem: Given multiple annotations (time-series) X = {X1, . . . ,XN},infer the common, consensus annotation (i.e., the “ground truth”).

0 500 1000 1500

−1

−0.5

0

0.5

1 SPIKE NOISE

BIAS

LAGS

VAL

EN

CE

FRAMES

Challenges.

• Account for annotator-specific bias and noise

• Model the common signal shared by all annotators

• Account for temporal discrepancies amongst annot.

Shared Space Component Analysis 14 / 28

Page 49: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Continuous Subjective Annotations

Problem: Given multiple annotations (time-series) X = {X1, . . . ,XN},infer the common, consensus annotation (i.e., the “ground truth”).

0 500 1000 1500

−1

−0.5

0

0.5

1 SPIKE NOISE

BIAS

LAGS

VAL

EN

CE

FRAMES

Challenges and Proposed Methodology (P/S).

• Account for annotator-specific bias and noise → private space

• Model the common signal shared by all annotators

• Account for temporal discrepancies amongst annot.

Shared Space Component Analysis 14 / 28

Page 50: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Continuous Subjective Annotations

Problem: Given multiple annotations (time-series) X = {X1, . . . ,XN},infer the common, consensus annotation (i.e., the “ground truth”).

0 500 1000 1500

−1

−0.5

0

0.5

1 SPIKE NOISE

BIAS

LAGS

VAL

EN

CE

FRAMES

Challenges and Proposed Methodology (P/S).

• Account for annotator-specific bias and noise → private space

• Model the common signal shared by all annotators → shared space

• Account for temporal discrepancies amongst annot.

Shared Space Component Analysis 14 / 28

Page 51: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Continuous Subjective Annotations

Problem: Given multiple annotations (time-series) X = {X1, . . . ,XN},infer the common, consensus annotation (i.e., the “ground truth”).

0 500 1000 1500

−1

−0.5

0

0.5

1 SPIKE NOISE

BIAS

LAGS

VAL

EN

CE

FRAMES

Challenges and Proposed Methodology (P/S).

• Account for annotator-specific bias and noise → private space

• Model the common signal shared by all annotators → shared space

• Account for temporal discrepancies amongst annot. → time warping

Shared Space Component Analysis 14 / 28

Page 52: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Dynamic Probabilistic CCA (DPCCA)

The generative model of DPCCA is as follows1.

xi,t = Wi,tzt + Bizi,t + εi

εi ∼ N (0, σ2i I)

P(zt |zt−1) ∼ N (Azzt−1,VZ ))

P(zi,t |zi,t−1) ∼ N (Azi zi,t−1,VZi )

xi ,t xk,t

zk,tzi ,t zt

Bi Bk

Wi Wk

zt−1 zk,t−1zi,t−1

• DPCCA models temporal dynamics in both private and sharedlatent spaces by incorporating a linear dynamical system prior.

• First order moments attained by applying smoothing (RTS).

• Inherent model flexibility. Application-wise, translates to e.g., beingable to condition the shared space (i.e. “ground-truth”) on anysubset of the available annotators, i.e. P(Z|X1), P(Z|X1, . . . ,XN).

1Nicolaou et al. @ ECCV ’12 & TPAMI ’14

Shared Space Component Analysis 15 / 28

Page 53: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Dynamic Probabilistic CCA (DPCCA)

The generative model of DPCCA is as follows1.

xi,t = Wi,tzt + Bizi,t + εi

εi ∼ N (0, σ2i I)

P(zt |zt−1) ∼ N (Azzt−1,VZ ))

P(zi,t |zi,t−1) ∼ N (Azi zi,t−1,VZi )

xi ,t xk,t

zk,tzi ,t zt

Bi Bk

Wi Wk

zt−1 zk,t−1zi,t−1

• DPCCA models temporal dynamics in both private and sharedlatent spaces by incorporating a linear dynamical system prior.

• First order moments attained by applying smoothing (RTS).

• Inherent model flexibility. Application-wise, translates to e.g., beingable to condition the shared space (i.e. “ground-truth”) on anysubset of the available annotators, i.e. P(Z|X1), P(Z|X1, . . . ,XN).

1Nicolaou et al. @ ECCV ’12 & TPAMI ’14

Shared Space Component Analysis 15 / 28

Page 54: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Dynamic Probabilistic CCA (DPCCA)

The generative model of DPCCA is as follows1.

xi,t = Wi,tzt + Bizi,t + εi

εi ∼ N (0, σ2i I)

P(zt |zt−1) ∼ N (Azzt−1,VZ ))

P(zi,t |zi,t−1) ∼ N (Azi zi,t−1,VZi )

xi ,t xk,t

zk,tzi ,t zt

Bi Bk

Wi Wk

zt−1 zk,t−1zi,t−1

• DPCCA models temporal dynamics in both private and sharedlatent spaces by incorporating a linear dynamical system prior.

• First order moments attained by applying smoothing (RTS).

• Inherent model flexibility. Application-wise, translates to e.g., beingable to condition the shared space (i.e. “ground-truth”) on anysubset of the available annotators, i.e. P(Z|X1), P(Z|X1, . . . ,XN).

1Nicolaou et al. @ ECCV ’12 & TPAMI ’14

Shared Space Component Analysis 15 / 28

Page 55: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Dynamic Probabilistic CCA (DPCCA)

The generative model of DPCCA is as follows1.

xi,t = Wi,tzt + Bizi,t + εi

εi ∼ N (0, σ2i I)

P(zt |zt−1) ∼ N (Azzt−1,VZ ))

P(zi,t |zi,t−1) ∼ N (Azi zi,t−1,VZi )

xi ,t xk,t

zk,tzi ,t zt

Bi Bk

Wi Wk

zt−1 zk,t−1zi,t−1

• DPCCA models temporal dynamics in both private and sharedlatent spaces by incorporating a linear dynamical system prior.

• First order moments attained by applying smoothing (RTS).

• Inherent model flexibility. Application-wise, translates to e.g., beingable to condition the shared space (i.e. “ground-truth”) on anysubset of the available annotators, i.e. P(Z|X1), P(Z|X1, . . . ,XN).

1Nicolaou et al. @ ECCV ’12 & TPAMI ’14

Shared Space Component Analysis 15 / 28

Page 56: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Dynamic Probabilistic CCA (DPCCA)

• DPCCA enables the inference of the individual, annotator-specificbias, model the annotation variance and discover the underlying,shared by all annotators signal, i.e. Z|X1, . . . ,XN while modellingtemporal dynamics.

• Nevertheless, temporal discrepancies in the annotations introducenoise in the derived spaces (e.g., lagging peaks will remainmisaligned).

Shared Space Component Analysis 16 / 28

Page 57: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Dynamic Probabilistic CCA (DPCCA)

• DPCCA enables the inference of the individual, annotator-specificbias, model the annotation variance and discover the underlying,shared by all annotators signal, i.e. Z|X1, . . . ,XN while modellingtemporal dynamics.

• Nevertheless, temporal discrepancies in the annotations introducenoise in the derived spaces (e.g., lagging peaks will remainmisaligned).

0 500 1000 1500

−1

−0.5

0

0.5

1 SPIKE NOISE

BIAS

LAGS

VAL

EN

CE

FRAMES

Shared Space Component Analysis 16 / 28

Page 58: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

DPCCA with Time Warpings

Solution

Incorporate warpings into DPCCA in order to eliminate temporaldiscrepancies in the “clean”, shared space.

L(D)PCTW =N∑i

N∑j,j 6=i

||E[Z|Xi ]∆i − E[Z|Xj ]∆j ||2FN(N − 1)

WARPED SHAREDLATENT

SPACE

INDIVIDUALFACTORS

OF X i

...

...

...

2iσ

,1iζ

,i Tζ

,1ix

,Tix,1iz

,i TziB

iWOBSERVATIONSEQUENCE X i

ANNOTATION , 1, ,i i N= …

WARPINGPROCESS i

1z

2z

Tz

...

zA

SHAREDLATENT SPACE Z

iA

...

Shared Space Component Analysis 17 / 28

Page 59: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

DPCCA with Time Warpings: Experiment I

(a) 2D spiral.

−60 −40 −20 0 20

−60

−50

−40

−30

−20

−10

0

10

20

(b)

1[ | ... ]nE Z X X1[ | ... ]nE Z X X

(a)

(e)

Z

[ | ]iE Z X

(c) (d)

Shared Space Component Analysis 18 / 28

Page 60: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

DPCCA with Time Warpings: Experiment I

(b) Noisy spirals generated from (a).

−60 −40 −20 0 20

−60

−50

−40

−30

−20

−10

0

10

20

(b)

1[ | ... ]nE Z X X1[ | ... ]nE Z X X

(a)

(e)

Z

[ | ]iE Z X

(c) (d)

Shared Space Component Analysis 18 / 28

Page 61: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

DPCCA with Time Warpings: Experiment I

(c) Shared space given all annotations (“ground-truth”, PCCA).

−60 −40 −20 0 20

−60

−50

−40

−30

−20

−10

0

10

20

(b)

1[ | ... ]nE Z X X1[ | ... ]nE Z X X

(a)

(e)

Z

[ | ]iE Z X

(c) (d)

Shared Space Component Analysis 18 / 28

Page 62: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

DPCCA with Time Warpings: Experiment I

(d) Shared space given all annotations (“ground-truth”, DPCCA).

−60 −40 −20 0 20

−60

−50

−40

−30

−20

−10

0

10

20

(b)

1[ | ... ]nE Z X X1[ | ... ]nE Z X X

(a)

(e)

Z

[ | ]iE Z X

(c) (d)

Shared Space Component Analysis 18 / 28

Page 63: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

DPCCA with Time Warpings: Experiment I

(e) Aligned shared space given each annotation (DPCCA).

−60 −40 −20 0 20

−60

−50

−40

−30

−20

−10

0

10

20

(b)

1[ | ... ]nE Z X X1[ | ... ]nE Z X X

(a)

(e)

Z

[ | ]iE Z X

(c) (d)

Shared Space Component Analysis 18 / 28

Page 64: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

DPCCA with Time Warpings: Experiment II

DPCTW applied to continuous emotion annotations.

(a) (b) (c)

(d) (e)

XOriginal Annotations [ | ]i iZ X Δ (PCTW) [ | ]i iZ X Δ (DPCTW)

(PCTW)1[ | , , ]… N1 NZ X Δ X Δ (DPCTW)1[ | , , ]… N1 NZ X Δ X Δ

0 200 400 600 800 1000

−0.5

0

0.5

0 200 400 600 800 1000

−0.5

0

0.5

0 200 400 600 800 1000

−0.5

0

0.5

0 200 400 600 800 1000

−0.5

0

0.5

100 200 300 400

−1

−0.5

0

0.5

1

VALE

NC

E

FRAMES

Shared Space Component Analysis 19 / 28

Page 65: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

DPCCA with Time Warpings: Experiment II

DPCTW applied to continuous emotion annotations.

(a) (b) (c)

(d) (e)

XOriginal Annotations [ | ]i iZ X Δ (PCTW) [ | ]i iZ X Δ (DPCTW)

(PCTW)1[ | , , ]… N1 NZ X Δ X Δ (DPCTW)1[ | , , ]… N1 NZ X Δ X Δ

0 200 400 600 800 1000

−0.5

0

0.5

0 200 400 600 800 1000

−0.5

0

0.5

0 200 400 600 800 1000

−0.5

0

0.5

0 200 400 600 800 1000

−0.5

0

0.5

100 200 300 400

−1

−0.5

0

0.5

1

VALE

NC

E

FRAMES

Shared Space Component Analysis 19 / 28

Page 66: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

DPCCA with Time Warpings: Experiment II

DPCTW applied to continuous emotion annotations.

(a) (b) (c)

(d) (e)

XOriginal Annotations [ | ]i iZ X Δ (PCTW) [ | ]i iZ X Δ (DPCTW)

(PCTW)1[ | , , ]… N1 NZ X Δ X Δ (DPCTW)1[ | , , ]… N1 NZ X Δ X Δ

0 200 400 600 800 1000

−0.5

0

0.5

0 200 400 600 800 1000

−0.5

0

0.5

0 200 400 600 800 1000

−0.5

0

0.5

0 200 400 600 800 1000

−0.5

0

0.5

100 200 300 400

−1

−0.5

0

0.5

1

VALE

NC

E

FRAMES

Shared Space Component Analysis 19 / 28

Page 67: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

DPCCA and Annotator Ranking.

Aumann’s agreement theorem

“If two people are Bayesian rationalists with common priors, and if theyhave common knowledge of their individual posteriors, then theirposteriors must be equal”.- Agreeing to disagree, RJ Aumann, Annals of Statistics 1976.

• If annotators are rational in a Bayesian sense, the shared spaceposterior (common knowledge) conditioned on each annotator(Z|Xi ) should be close to each other.

• We can rank the annotators based on the latent posterior bycomputing a probabilistic measure of difference (i.e. KL div.).

• Can detect “bad” annotators, e.g., adversarial or malicious,spammers etc.

Shared Space Component Analysis 20 / 28

Page 68: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

DPCCA and Annotator Ranking.

Aumann’s agreement theorem

“If two people are Bayesian rationalists with common priors, and if theyhave common knowledge of their individual posteriors, then theirposteriors must be equal”.- Agreeing to disagree, RJ Aumann, Annals of Statistics 1976.

• If annotators are rational in a Bayesian sense, the shared spaceposterior (common knowledge) conditioned on each annotator(Z|Xi ) should be close to each other.

• We can rank the annotators based on the latent posterior bycomputing a probabilistic measure of difference (i.e. KL div.).

• Can detect “bad” annotators, e.g., adversarial or malicious,spammers etc.

Shared Space Component Analysis 20 / 28

Page 69: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

DPCCA and Annotator Ranking.

Aumann’s agreement theorem

“If two people are Bayesian rationalists with common priors, and if theyhave common knowledge of their individual posteriors, then theirposteriors must be equal”.- Agreeing to disagree, RJ Aumann, Annals of Statistics 1976.

• If annotators are rational in a Bayesian sense, the shared spaceposterior (common knowledge) conditioned on each annotator(Z|Xi ) should be close to each other.

• We can rank the annotators based on the latent posterior bycomputing a probabilistic measure of difference (i.e. KL div.).

• Can detect “bad” annotators, e.g., adversarial or malicious,spammers etc.

Shared Space Component Analysis 20 / 28

Page 70: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

DPCCA and Annotator Ranking.

Aumann’s agreement theorem

“If two people are Bayesian rationalists with common priors, and if theyhave common knowledge of their individual posteriors, then theirposteriors must be equal”.- Agreeing to disagree, RJ Aumann, Annals of Statistics 1976.

• If annotators are rational in a Bayesian sense, the shared spaceposterior (common knowledge) conditioned on each annotator(Z|Xi ) should be close to each other.

• We can rank the annotators based on the latent posterior bycomputing a probabilistic measure of difference (i.e. KL div.).

• Can detect “bad” annotators, e.g., adversarial or malicious,spammers etc.

Shared Space Component Analysis 20 / 28

Page 71: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

DPCCA and Annotator Ranking: Experiment

A random, structured annotation (sinusoid).

0 50 100 150

−1

−0.5

0

0.5

0 50 100 150

−0.1

0

0.1

0 50 100 150−0.2

−0.1

0

0.1

2 4 6

2

4

6

Annotations X Derived Annotation [ | ]Z Individual Spacei

[ | ]Z Distance Matrix on [ | | ]i

Z Z X

1-5: True , 6: Sinusoid, 7: Shared Space

Shared Space Component Analysis 21 / 28

Page 72: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

DPCCA and Annotator Ranking: Experiment

Set of random annotations.

2 4 6 8 10 12 14

2

4

6

8

10

12

14

1-5: True , 6-14: Random, 15: Shared Space2 4 6 8 10 12 14

2

4

6

8

10

12

14

1-5: True , 6-14: Random

Distance Matrix on X

2 4 6 8 10 12 14

2 4 6 8 10 12 14| iZ XKL Divergence between and |Z

Inferred weight vector

Distance Matrix on [ | | ]iZ X Z

Shared Space Component Analysis 22 / 28

Page 73: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

DPCCA and Annotator Ranking: Experiment

Two clusters of annotations with facial points as features.

2 4 6 8 10 12

2

4

6

8

10

12

1-6: True, 7-12: Irrelevant

Distance Matrix on X

0 50 100 150−1

−0.5

0

0.5

1

0 50 100 150−0.2

0

0.2

0.4

0.6

True Annotations (1-6)

Irrelevant Annotations (7-12)2 4 6 8 10 12

2

4

6

8

10

12

1-6: True, 7-12: Irrelevant, 13: Facial Points

Distance Matrix on [ | | ]iZ X Z Y Annotations X

Shared Space Component Analysis 23 / 28

Page 74: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Robust Canonical Correlation Analysis (RCCA)

RCCA is a robust-to-gross-noise variant of CCA. It is motivated bythe wide presence of non-Gaussian noise in features extracted underunregulated, real-world conditions, e.g.,

→ in Visual Features: texture occlusions, tracking errors, spike noise.

→ in Audio Features: irrelevant (uncorrelated) sounds, equipment noise.

Shared Space Component Analysis 24 / 28

Page 75: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

RCCA Formulation (1/2)

Given high-dimensional feature spaces Z ∈ Rdz×T and A ∈ Rda×T

(representing e.g., video/audio cues), we pose the problem

argminPz ,Pa,Ez ,Ea

rank(Pz) + rank(Pa)

+λ1‖Ez‖0 + λ2‖Ea‖0 +µ

2‖PzZ− PaA‖2

F

s.t. Z = PzZ + Ez ,A = PaA + Ea. (1)

Pz ,Pa: Low-rank subspace spanning the correlated observations.

Ez ,Ea: Capturing uncorrelated components, accounting for noise/outliers.

λ1, λ2, µ: non-negative parameters.

→ λ tunes the contribution of each signal to the clean space.

Problem (1) deemed difficult to solve due to the discrete nature of the rank

function and the `0 norm (NP-Hard).

Shared Space Component Analysis 25 / 28

Page 76: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

RCCA Formulation (1/2)

Given high-dimensional feature spaces Z ∈ Rdz×T and A ∈ Rda×T

(representing e.g., video/audio cues), we pose the problem

argminPz ,Pa,Ez ,Ea

rank(Pz) + rank(Pa)

+λ1‖Ez‖0 + λ2‖Ea‖0 +µ

2

LS-CCA︷ ︸︸ ︷‖PzZ− PaA‖2

F

s.t. Z = PzZ + Ez ,A = PaA + Ea. (1)

Pz ,Pa: Low-rank subspace spanning the correlated observations.

Ez ,Ea: Capturing uncorrelated components, accounting for noise/outliers.

λ1, λ2, µ: non-negative parameters.

→ λ tunes the contribution of each signal to the clean space.

Problem (1) deemed difficult to solve due to the discrete nature of the rank

function and the `0 norm (NP-Hard).

Shared Space Component Analysis 25 / 28

Page 77: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

RCCA Formulation (1/2)

Given high-dimensional feature spaces Z ∈ Rdz×T and A ∈ Rda×T

(representing e.g., video/audio cues), we pose the problem

argminPz ,Pa,Ez ,Ea

rank(Pz) + rank(Pa)

+λ1‖Ez‖0 + λ2‖Ea‖0 +µ

2‖PzZ− PaA‖2

F

s.t. Z = PzZ + Ez ,A = PaA + Ea. (1)

Pz ,Pa: Low-rank subspace spanning the correlated observations.

Ez ,Ea: Capturing uncorrelated components, accounting for noise/outliers.

λ1, λ2, µ: non-negative parameters.

→ λ tunes the contribution of each signal to the clean space.

Problem (1) deemed difficult to solve due to the discrete nature of the rank

function and the `0 norm (NP-Hard).

Shared Space Component Analysis 25 / 28

Page 78: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Part II: RCCA Formulation (2/2)

Solution: Adopt convex relaxations of (1) by replacing the `0 norm and

rank function with their convex envelopes, the `1 and nuclear norm

respectively, as follows

argminPz ,Pa,Ez ,Ea

‖Pz‖∗ + ‖Pa‖∗

+λ1‖Ez‖1 + λ2‖Ea‖1 +µ

2‖PzZ− PaA‖2

F

s.t. Z = PzZ + Ez ,A = PaA + Ea. (2)

→ Problem (2) can be solved by employing the Linearized AlternatingDirections Method (LADM), a variant of the Alternating DirectionAugmented Lagrange Multiplier method.

Shared Space Component Analysis 26 / 28

Page 79: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Wrapping up

• There are many other private-shared space models, conceptuallysimilar to IBFA, such as Common and Individual Features Analysis(COBE) and Joint and Individual Variation Explained (JIVE).

• The private (or individual) space can be extremely useful in manyapplications since it can be discriminative (e.g., face clustering).Prince, Elder et al. propose a PLDA (for inferences about identity,ICCV 2007, TPAMI 2012) based on a similar setting as IBFA.

• Partial CCA is another interesting variant: Eliminate the influenceof a third variable, and subsequently project observations intomaximally correlated space (c.f., Mukuta, ICML 2014).

• There are also several works on non-linear shared-space modelsbased on GPs.

Shared Space Component Analysis 27 / 28

Page 80: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Wrapping up

• There are many other private-shared space models, conceptuallysimilar to IBFA, such as Common and Individual Features Analysis(COBE) and Joint and Individual Variation Explained (JIVE).

• The private (or individual) space can be extremely useful in manyapplications since it can be discriminative (e.g., face clustering).Prince, Elder et al. propose a PLDA (for inferences about identity,ICCV 2007, TPAMI 2012) based on a similar setting as IBFA.

• Partial CCA is another interesting variant: Eliminate the influenceof a third variable, and subsequently project observations intomaximally correlated space (c.f., Mukuta, ICML 2014).

• There are also several works on non-linear shared-space modelsbased on GPs.

Shared Space Component Analysis 27 / 28

Page 81: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Wrapping up

• There are many other private-shared space models, conceptuallysimilar to IBFA, such as Common and Individual Features Analysis(COBE) and Joint and Individual Variation Explained (JIVE).

• The private (or individual) space can be extremely useful in manyapplications since it can be discriminative (e.g., face clustering).Prince, Elder et al. propose a PLDA (for inferences about identity,ICCV 2007, TPAMI 2012) based on a similar setting as IBFA.

• Partial CCA is another interesting variant: Eliminate the influenceof a third variable, and subsequently project observations intomaximally correlated space (c.f., Mukuta, ICML 2014).

• There are also several works on non-linear shared-space modelsbased on GPs.

Shared Space Component Analysis 27 / 28

Page 82: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Wrapping up

• There are many other private-shared space models, conceptuallysimilar to IBFA, such as Common and Individual Features Analysis(COBE) and Joint and Individual Variation Explained (JIVE).

• The private (or individual) space can be extremely useful in manyapplications since it can be discriminative (e.g., face clustering).Prince, Elder et al. propose a PLDA (for inferences about identity,ICCV 2007, TPAMI 2012) based on a similar setting as IBFA.

• Partial CCA is another interesting variant: Eliminate the influenceof a third variable, and subsequently project observations intomaximally correlated space (c.f., Mukuta, ICML 2014).

• There are also several works on non-linear shared-space modelsbased on GPs.

Shared Space Component Analysis 27 / 28

Page 83: Shared Space Component Analysis - Imperial College London · 2015-11-12 · Shared Space Component Analysis Intro Component Analysis (CA): Collection of statistical methods aiming

Shared Space Component Analysis

Thank you.

Shared Space Component Analysis 28 / 28