21

Marine Cadoret, Sébastien Lê, Jérôme Pagèsmarine.cad1.free.fr/presentation_sfc-cladag.pdf · the RV coe cient between X X^1 and X^2. SFC-CLADAG (Caserta) Elements of validity

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Marine Cadoret, Sébastien Lê, Jérôme Pagèsmarine.cad1.free.fr/presentation_sfc-cladag.pdf · the RV coe cient between X X^1 and X^2. SFC-CLADAG (Caserta) Elements of validity

Elements of validity in Multiple Factor Analysis

Marine Cadoret, Sébastien Lê, Jérôme Pagès

Applied Mathematics Department, Agrocampus Rennes, France

Caserta, june 11th 2008

SFC-CLADAG (Caserta) Elements of validity in MFA 1 / 20

Page 2: Marine Cadoret, Sébastien Lê, Jérôme Pagèsmarine.cad1.free.fr/presentation_sfc-cladag.pdf · the RV coe cient between X X^1 and X^2. SFC-CLADAG (Caserta) Elements of validity

Context

Problem

Selection of the number of dimensions in Principal Component Analysis(PCA) :

Bar plot of the eigenvalues

Visual test : Cattell criterion

Stability in spite of perturbations in the dataset

SFC-CLADAG (Caserta) Elements of validity in MFA 2 / 20

Page 3: Marine Cadoret, Sébastien Lê, Jérôme Pagèsmarine.cad1.free.fr/presentation_sfc-cladag.pdf · the RV coe cient between X X^1 and X^2. SFC-CLADAG (Caserta) Elements of validity

Methods PCA

Dray, 2007 : �rst dimension

X√λ1 v1 X̂1

u01

Eigenvector of XX 0 Eigenvector of X 0X

Is the data reconstituted from the �rst dimension (X̂1) closer to theone of original data (X ) than a random table?

Measure of similarity : RV coe�cient (Escou�er, 1973)

SFC-CLADAG (Caserta) Elements of validity in MFA 3 / 20

Page 4: Marine Cadoret, Sébastien Lê, Jérôme Pagèsmarine.cad1.free.fr/presentation_sfc-cladag.pdf · the RV coe cient between X X^1 and X^2. SFC-CLADAG (Caserta) Elements of validity

Methods PCA

Dray, 2007 : �rst dimension

X√λ1 v1 X̂1

u01

Eigenvector of XX 0 Eigenvector of X 0X

Is the observed RV coe�cient large?

H0 : Absence of structure among variables

Procedure based on permutation tests

SFC-CLADAG (Caserta) Elements of validity in MFA 4 / 20

Page 5: Marine Cadoret, Sébastien Lê, Jérôme Pagèsmarine.cad1.free.fr/presentation_sfc-cladag.pdf · the RV coe cient between X X^1 and X^2. SFC-CLADAG (Caserta) Elements of validity

Methods PCA

First dimension : permutation tests

Calculate the p-value associated to the observed RV :1 Repeat a large number of times :

1 Independent row permutations within each column of X → Xp

2 PCA on X p

3 Reconstitution of X p from the �rst dimension of the PCA on X p → X̂p

1

4 Calculate RV (X p, X̂ p

1)

2 Distribution of RV coe�cient under H0

3 Identify the observed value in this distribution to get the p-value

SFC-CLADAG (Caserta) Elements of validity in MFA 5 / 20

Page 6: Marine Cadoret, Sébastien Lê, Jérôme Pagèsmarine.cad1.free.fr/presentation_sfc-cladag.pdf · the RV coe cient between X X^1 and X^2. SFC-CLADAG (Caserta) Elements of validity

Methods PCA

Evaluation of Dray's procedure

Behavior of the procedure under the alternative hypothesis (Dray)

Behavior of the procedure under the null hypothesis

SFC-CLADAG (Caserta) Elements of validity in MFA 6 / 20

Page 7: Marine Cadoret, Sébastien Lê, Jérôme Pagèsmarine.cad1.free.fr/presentation_sfc-cladag.pdf · the RV coe cient between X X^1 and X^2. SFC-CLADAG (Caserta) Elements of validity

Methods PCA

Behavior of the procedure under H0 : �rst dimension

simulation algorithm

0

0

1

1

×1000

0 1

Compute the RV between X and X̂1

Compute the RV between Xp and X̂p1

Distribution of RV

Compute the p-value associated to the observed RV

Distribution of p-value under H0

Reconstitution of the first dimension of Xp → X̂p1

Simulation of a dataset X under H0

Row permutations of X → Xp

PCA on Xp

×10000

SFC-CLADAG (Caserta) Elements of validity in MFA 7 / 20

Page 8: Marine Cadoret, Sébastien Lê, Jérôme Pagèsmarine.cad1.free.fr/presentation_sfc-cladag.pdf · the RV coe cient between X X^1 and X^2. SFC-CLADAG (Caserta) Elements of validity

Methods PCA

Behavior of the procedure under H0 : �rst dimension

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

First dimension

level

% o

f dat

aset

s w

ith s

igni

fican

t 1st

dim

ensi

on

⇒ For a signi�cant level of α%, we observe α% of data tables having asigni�cant �rst dimension

SFC-CLADAG (Caserta) Elements of validity in MFA 7 / 20

Page 9: Marine Cadoret, Sébastien Lê, Jérôme Pagèsmarine.cad1.free.fr/presentation_sfc-cladag.pdf · the RV coe cient between X X^1 and X^2. SFC-CLADAG (Caserta) Elements of validity

Methods PCA

Dray, 2007 : second dimension

We are in the space orthogonal to the �rst dimension

SFC-CLADAG (Caserta) Elements of validity in MFA 8 / 20

Page 10: Marine Cadoret, Sébastien Lê, Jérôme Pagèsmarine.cad1.free.fr/presentation_sfc-cladag.pdf · the RV coe cient between X X^1 and X^2. SFC-CLADAG (Caserta) Elements of validity

Methods PCA

Dray, 2007 : second dimension

We use the same methodology that for the �rst dimension : we calculatethe RV coe�cient between X − X̂1 and X̂2.

SFC-CLADAG (Caserta) Elements of validity in MFA 9 / 20

Page 11: Marine Cadoret, Sébastien Lê, Jérôme Pagèsmarine.cad1.free.fr/presentation_sfc-cladag.pdf · the RV coe cient between X X^1 and X^2. SFC-CLADAG (Caserta) Elements of validity

Methods PCA

Behavior of the procedure under H0 : second dimension

Same simulation procedure

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●

●●●●●●

●●●

●●●●●●

●●●●●●

●●●●●

●●●

●●●●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Second dimension

Significant level of 20% for the first dimensionlevel

% o

f dat

aset

s w

ith s

igni

fican

t 2nd

e di

men

sion

SFC-CLADAG (Caserta) Elements of validity in MFA 10 / 20

Page 12: Marine Cadoret, Sébastien Lê, Jérôme Pagèsmarine.cad1.free.fr/presentation_sfc-cladag.pdf · the RV coe cient between X X^1 and X^2. SFC-CLADAG (Caserta) Elements of validity

Methods PCA

Particular case

⇒ Stability 6= Signi�cant structure

SFC-CLADAG (Caserta) Elements of validity in MFA 11 / 20

Page 13: Marine Cadoret, Sébastien Lê, Jérôme Pagèsmarine.cad1.free.fr/presentation_sfc-cladag.pdf · the RV coe cient between X X^1 and X^2. SFC-CLADAG (Caserta) Elements of validity

Methods MFA

Multiple Factor Analysis

Multiple Factor Analysis deals with data tables in which a set of individuals(I ) is described by several groups of variables (J)

MFA highlights a structure common to all the groups, to some groups orspeci�c to a group.

SFC-CLADAG (Caserta) Elements of validity in MFA 12 / 20

Page 14: Marine Cadoret, Sébastien Lê, Jérôme Pagèsmarine.cad1.free.fr/presentation_sfc-cladag.pdf · the RV coe cient between X X^1 and X^2. SFC-CLADAG (Caserta) Elements of validity

Methods MFA

2 main questions

Does the dimension s correspond to a structure common to severalgroups?

In this case, which groups contribute to this common structure?

SFC-CLADAG (Caserta) Elements of validity in MFA 13 / 20

Page 15: Marine Cadoret, Sébastien Lê, Jérôme Pagèsmarine.cad1.free.fr/presentation_sfc-cladag.pdf · the RV coe cient between X X^1 and X^2. SFC-CLADAG (Caserta) Elements of validity

Methods MFA

Existence of a common structure in MFA

H0 : Absence of common structure (no links between groups)

Row permutations within each group

First dimension : Calculate the RV coe�cient between X and X̂1

SFC-CLADAG (Caserta) Elements of validity in MFA 14 / 20

Page 16: Marine Cadoret, Sébastien Lê, Jérôme Pagèsmarine.cad1.free.fr/presentation_sfc-cladag.pdf · the RV coe cient between X X^1 and X^2. SFC-CLADAG (Caserta) Elements of validity

Methods MFA

Contribution of groups to the common structure

H0 : No contribution of the group j to the common structure

First dimension : Calculate the RV coe�cient between Xj and [X̂j ]1

SFC-CLADAG (Caserta) Elements of validity in MFA 15 / 20

Page 17: Marine Cadoret, Sébastien Lê, Jérôme Pagèsmarine.cad1.free.fr/presentation_sfc-cladag.pdf · the RV coe cient between X X^1 and X^2. SFC-CLADAG (Caserta) Elements of validity

Application

Application

Classical example of MFA (INRA Angers, Agrocampus Rennes, Spad,FactoMineR)

21 wines described by 27 variables gathered into 4 groups :

Olfaction before shaking : 5 variables

Vision : 3 variables

Olfaction after shaking : 10 variables

Gustation : 9 variables

Expected results :

Dim.1 Dim.2 Dim.3 Dim.4

Olfaction before shaking × × ×Vision ×Olfaction after shaking × × ×Gustation × ×

SFC-CLADAG (Caserta) Elements of validity in MFA 16 / 20

Page 18: Marine Cadoret, Sébastien Lê, Jérôme Pagèsmarine.cad1.free.fr/presentation_sfc-cladag.pdf · the RV coe cient between X X^1 and X^2. SFC-CLADAG (Caserta) Elements of validity

Application

Application : Number of dimensions

λ P-value

Dim.1 3.46 < 0.001Dim.2 1.37 < 0.001Dim.3 0.62 0.004Dim.4 0.37 0.15

SFC-CLADAG (Caserta) Elements of validity in MFA 17 / 20

Page 19: Marine Cadoret, Sébastien Lê, Jérôme Pagèsmarine.cad1.free.fr/presentation_sfc-cladag.pdf · the RV coe cient between X X^1 and X^2. SFC-CLADAG (Caserta) Elements of validity

Application

Application : Contribution of the groups

Contribution

Dim.1 Dim.2 Dim.3 Dim.4

Olfaction before shaking 0.78 0.62 0.37 0.17Vision 0.85 0.04 0.01 0.05Olfaction after shaking 0.92 0.47 0.18 0.10Gustation 0.90 0.24 0.05 0.05Sum 3.46 1.37 0.62 0.37

P-value

Dim.1 Dim.2 Dim.3 Dim.4

Olfaction before shaking 0.02 0.174 0.038 0.127Vision 0.007 0.104 0.387 0.149Olfaction after shaking < 0.001 0.004 < 0.001 0.638Gustation < 0.001 0.002 0.278 0.39

SFC-CLADAG (Caserta) Elements of validity in MFA 18 / 20

Page 20: Marine Cadoret, Sébastien Lê, Jérôme Pagèsmarine.cad1.free.fr/presentation_sfc-cladag.pdf · the RV coe cient between X X^1 and X^2. SFC-CLADAG (Caserta) Elements of validity

Conclusion, perspective

Conclusion, perspective

Dray's procedure extended to MFA

Ambiguity between stability and signi�cant structure

Implementation of systematic simulations in MFA

SFC-CLADAG (Caserta) Elements of validity in MFA 19 / 20

Page 21: Marine Cadoret, Sébastien Lê, Jérôme Pagèsmarine.cad1.free.fr/presentation_sfc-cladag.pdf · the RV coe cient between X X^1 and X^2. SFC-CLADAG (Caserta) Elements of validity

http://factominer.free.fr

R package dedicated to exploratory analysiswritten by Applied Mathematics Department of Agrocampus

SFC-CLADAG (Caserta) Elements of validity in MFA 20 / 20