34
2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Université d’Ottawa / University of Ottawa Lecture 3: A brief background Lecture 3: A brief background to multivariate statistics to multivariate statistics Univariate versus multivariate statistics The material of multivariate analysis Displaying multivariate data The uses of multivariate statistics A refresher of matrix algebra Displaying multivariate data

Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

Embed Size (px)

Citation preview

Page 1: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.1

Université d’Ottawa / University of Ottawa

Lecture 3: A brief background to Lecture 3: A brief background to multivariate statisticsmultivariate statistics

Lecture 3: A brief background to Lecture 3: A brief background to multivariate statisticsmultivariate statistics

Univariate versus multivariate statistics The material of multivariate analysis Displaying multivariate data The uses of multivariate statistics A refresher of matrix algebra Displaying multivariate data

Univariate versus multivariate statistics The material of multivariate analysis Displaying multivariate data The uses of multivariate statistics A refresher of matrix algebra Displaying multivariate data

Page 2: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.2

Université d’Ottawa / University of Ottawa

Multivariate versus univariate Multivariate versus univariate statisticsstatistics

Multivariate versus univariate Multivariate versus univariate statisticsstatistics

In univariate statistical analysis, we are concerned with analyzing variation in a single random variable.

In multivariate statistical analysis, we are concerned with analyzing variation in several random variables which may or may not be related.

In univariate statistical analysis, we are concerned with analyzing variation in a single random variable.

In multivariate statistical analysis, we are concerned with analyzing variation in several random variables which may or may not be related.

Page 3: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.3

Université d’Ottawa / University of Ottawa

The material of multivariate analysisThe material of multivariate analysisThe material of multivariate analysisThe material of multivariate analysis

Multivariate data consists of a set of measurements (usually related) of P variables X1, X2, …, XP on n sample units.

The variables Xj may be ratio, ordinal, or nominal.

Multivariate data consists of a set of measurements (usually related) of P variables X1, X2, …, XP on n sample units.

The variables Xj may be ratio, ordinal, or nominal.

Sample Variable1

Variable2

VariableP

1 X11 X21 XP1

2 X12 X22 XP2

n X1n X2n XPn

Page 4: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.4

Université d’Ottawa / University of Ottawa

Example 1: Bumpus’ sparrow dataExample 1: Bumpus’ sparrow dataExample 1: Bumpus’ sparrow dataExample 1: Bumpus’ sparrow data

5 morphological measurements (in mm) of 49 sparrows recovered from a storm in 1898.

5 morphological measurements (in mm) of 49 sparrows recovered from a storm in 1898.

Bird Length AlarExtent

HeadLength

Humeruslength

Keellength

1 156 245 31.6 18.5 20.5

2 154 240 30.4 17.9 19.6

49 164 248 32.3 18.8 20.9

... ... ...... ... ...

Page 5: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.5

Université d’Ottawa / University of Ottawa

Example 2: Example 2: Biodiversity of SE Biodiversity of SE Ontario wetlandsOntario wetlands

Example 2: Example 2: Biodiversity of SE Biodiversity of SE Ontario wetlandsOntario wetlands

Species richness (number of species) of 5 different taxa in 57 wetlands in southeastern Ontario.

Species richness (number of species) of 5 different taxa in 57 wetlands in southeastern Ontario.

Wetland Birds Amphibians Reptiles Mammals Plants

1 82 7 4 9 223

2 36 2 1 3 119

57 61 4 2 6 173

... ... ...... ... ...

Page 6: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.6

Université d’Ottawa / University of Ottawa

The material of The material of multivariate multivariate

analysisanalysis

The material of The material of multivariate multivariate

analysisanalysis

In some applications, the measured variables comprise both dependent (X) and independent (Y) variables.

In some applications, the measured variables comprise both dependent (X) and independent (Y) variables.

Sample Independentvariable

1

Independentvariable

2

Dependentvariable

1

Dependentvariable

2

1 X11 X21 Y11 Y21

2 X12 X22 Y12 Y22

n X1n X2n Y1n Y2n

Page 7: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.7

Université d’Ottawa / University of Ottawa

Example 1: Pgi frequencies in Example 1: Pgi frequencies in California California Euphydras edithaEuphydras editha

colonies in relation to colonies in relation to environmental factorsenvironmental factors..

Example 1: Pgi frequencies in Example 1: Pgi frequencies in California California Euphydras edithaEuphydras editha

colonies in relation to colonies in relation to environmental factorsenvironmental factors..

Colony PgiI Pgi2 Annualprecip.

(in.)

Altitude(ft)

Annualmax.temp.

SS .22 .57 43 500 98

SB .20 .38 20 800 92

GL .01 .92 50 10,500 81

Page 8: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.8

Université d’Ottawa / University of Ottawa

Example 2: Anurans in SE Example 2: Anurans in SE Ontario wetlands in Ontario wetlands in relation to surrounding relation to surrounding forest cover and road forest cover and road densitiesdensities

Example 2: Anurans in SE Example 2: Anurans in SE Ontario wetlands in Ontario wetlands in relation to surrounding relation to surrounding forest cover and road forest cover and road densitiesdensities

Wetland LF GTF MF Road density(1 km)

Forestcover (1 km)

1 1 0 0 20.2 0.10

2 1 1 0 6.2 0.90

3 1 0 0 12.6 0.35

4 1 1 1 0.02 0.95

Page 9: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.9

Université d’Ottawa / University of Ottawa

Multivariate LS Multivariate LS estimatorsestimators

Multivariate LS Multivariate LS estimatorsestimators

The vector of sample means, variances and covariances is an estimate of the true (“population”) means, variances and covariances.

As such, inferences to the latter based on the former assume random sampling.

The vector of sample means, variances and covariances is an estimate of the true (“population”) means, variances and covariances.

As such, inferences to the latter based on the former assume random sampling.

C

x

))((1

1

)(1

1

),,,(1

1

2

1

2

211

kkji

n

jijik

i

n

jiji

p

n

jiji

xxxxn

c

xxn

s

xxxxn

x

Cx,

Population

Sample

Page 10: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.10

Université d’Ottawa / University of Ottawa

The sample The sample covariance matrixcovariance matrix

The sample The sample covariance matrixcovariance matrix

The sample covariance matrix is a square matrix whose diagonal elements give the sample variances for each measured variable (si

2), and whose off-diagonal elements are the sample covariances between pairs of variables (cik).

The sample covariance matrix is a square matrix whose diagonal elements give the sample variances for each measured variable (si

2), and whose off-diagonal elements are the sample covariances between pairs of variables (cik).

))((1

1

)(1

1

1

2

1

2

kkji

n

jijik

i

n

jiji

xxxxn

c

xxn

s

221

22221

11221

mmm

m

m

scc

csc

ccs

C

Page 11: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.11

Université d’Ottawa / University of Ottawa

A review of matrix A review of matrix algebraalgebra

A review of matrix A review of matrix algebraalgebra

A matrix of size m x n is an array of numbers (either real or complex) with m rows and n columns.

Matrices with one column are column vectors, matrices with one row are row vectors.

A matrix of size m x n is an array of numbers (either real or complex) with m rows and n columns.

Matrices with one column are column vectors, matrices with one row are row vectors.

mnmm

n

n

aaa

aaa

aaa

21

22221

11211

A

mc

c

c

2

1

c ),,,( 21 nrrr r

Page 12: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.12

Université d’Ottawa / University of Ottawa

Special matricesSpecial matricesSpecial matricesSpecial matrices

A zero matrix 0 has all elements equal to zero.

A diagonal matrix T is a square matrix (m = n) with all elements equal to zero except the main diagonal.

An identity matrix I is a diagonal matrix with all diagonal terms equal to zero.

A zero matrix 0 has all elements equal to zero.

A diagonal matrix T is a square matrix (m = n) with all elements equal to zero except the main diagonal.

An identity matrix I is a diagonal matrix with all diagonal terms equal to zero.

000

000

000

0

nt

t

t

00

00

00

1

1

T

100

010

001

I

Page 13: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.13

Université d’Ottawa / University of Ottawa

Matrix Matrix operationsoperations

Matrix Matrix operationsoperations

The transpose of a matrix A (AT) is obtained by interchanging rows and columns.

The transpose of a row vector is a column vector, and the transpose of a column vector is a row vector.

The transpose of a matrix A (AT) is obtained by interchanging rows and columns.

The transpose of a row vector is a column vector, and the transpose of a column vector is a row vector.

mnmm

n

n

aaa

aaa

aaa

21

22221

11211

A

mnnn

m

m

T

aaa

aaa

aaa

21

22212

12111

A

mc

c

c

2

1

c ),,,( 21 nT ccc c

Page 14: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.14

Université d’Ottawa / University of Ottawa

The trace of a matrixThe trace of a matrixThe trace of a matrixThe trace of a matrix

The trace of a matrix A, denoted tr(A), is the sum of the diagonal elements.

The trace is defined only for square matrices.

The trace of a matrix A, denoted tr(A), is the sum of the diagonal elements.

The trace is defined only for square matrices.

mnmm

n

n

aaa

aaa

aaa

21

22221

11211

A

n

iii

nn

a

aaaAtr

1

2211)(

Page 15: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.15

Université d’Ottawa / University of Ottawa

Matrix addition and subtractionMatrix addition and subtractionMatrix addition and subtractionMatrix addition and subtraction

Two matrices A and B are conformable for addition if they are of the same size (same numbers of rows and columns).

The resulting matrix A + B (A - B) is obtained by adding (subtracting) individual matrix elements.

Two matrices A and B are conformable for addition if they are of the same size (same numbers of rows and columns).

The resulting matrix A + B (A - B) is obtained by adding (subtracting) individual matrix elements.

22222121

12121111

22222121

12121111

2221

1211

2221

1211

,

,

baba

baba

baba

baba

bb

bb

aa

aa

BA

BA

BA

Page 16: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.16

Université d’Ottawa / University of Ottawa

Matrix multiplication by a scalarMatrix multiplication by a scalarMatrix multiplication by a scalarMatrix multiplication by a scalar

The multiplication of a matrix A by a scalar k involves multiplying each element of A by k.

The multiplication of a matrix A by a scalar k involves multiplying each element of A by k.

2221

1211

2221

1211 ,kaka

kakak

aa

aaAA

Page 17: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.17

Université d’Ottawa / University of Ottawa

Matrix Matrix multiplicationmultiplication

Matrix Matrix multiplicationmultiplication

Two matrices A (m x n) and B (n x p) are conformable for multiplication (A • B) if the number of columns in A equals the number of rows in B.

A • B and B • A are both defined only when both A and B are square, but even when true, in general A • B B • A .

Two matrices A (m x n) and B (n x p) are conformable for multiplication (A • B) if the number of columns in A equals the number of rows in B.

A • B and B • A are both defined only when both A and B are square, but even when true, in general A • B B • A .

n

jjpmj

n

jjmj

n

jjpj

n

jjj

npn

p

mnm

n

baba

baba

ab

bb

aa

aa

111

11

111

1

111

1

111

BA

11

03,

21

12

10

11,

11

12

ABBA

BA

Page 18: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.18

Université d’Ottawa / University of Ottawa

Matrix inversionMatrix inversionMatrix inversionMatrix inversion The inverse of a matrix A,

denoted A-1, is the matrix solving the matrix equation

where I is the identity matrix.

Only square matrices are invertible, and some matrices cannot be inverted (“singular” matrices)

The inverse of a matrix A, denoted A-1, is the matrix solving the matrix equation

where I is the identity matrix.

Only square matrices are invertible, and some matrices cannot be inverted (“singular” matrices)

3/23/1

3/13/2,

21

12 1AA

I

AA

10

01

3/23/1

3/13/2

21

121

IAA 1

Page 19: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.19

Université d’Ottawa / University of Ottawa

The covariance The covariance matrixmatrix

The covariance The covariance matrixmatrix

A multivariate sample is described by a covariance matrix, whose diagonal elements give the sample variances for each measured variable (si

2), and whose off-diagonal elements are the sample covariances between pairs of variables (cik).

A multivariate sample is described by a covariance matrix, whose diagonal elements give the sample variances for each measured variable (si

2), and whose off-diagonal elements are the sample covariances between pairs of variables (cik).

))((1

1

)(1

1

1

2

1

2

kkji

n

jijik

i

n

jiji

xxxxn

c

xxn

s

221

22221

11221

mmm

m

m

scc

csc

ccs

C

Page 20: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.20

Université d’Ottawa / University of Ottawa

Calculating the sample Calculating the sample covariance matrixcovariance matrix

Calculating the sample Calculating the sample covariance matrixcovariance matrix

X dT

LNM

OQP

1 1

3 0

0

3

SSCP X XSSCP

C LNM

OQPLNM

OQP

d d

SS CP

CP SS nT 1 12

21 2

2 3

3 18 1,

L OL OL OXd

NMMMQPPP

NMMM Q

PPP

NMMM Q

PP1

3

2

1

4

7

2

2

2

4

4

4

1

1

0

3

0

3

7

4

1

2

3

1

x x1 22 4 ,

x1 x2

Page 21: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.21

Université d’Ottawa / University of Ottawa

The determinant of a matrix: 2 X 2 The determinant of a matrix: 2 X 2 matricesmatrices

The determinant of a matrix: 2 X 2 The determinant of a matrix: 2 X 2 matricesmatrices

The determinant of a matrix A, denoted det(A) or |A|, is a unique number associated with every square matrix.

In multivariate statistics, the determinant of the sample covariance matrix C plays a crucial role in hypothesis testing.

The determinant of a matrix A, denoted det(A) or |A|, is a unique number associated with every square matrix.

In multivariate statistics, the determinant of the sample covariance matrix C plays a crucial role in hypothesis testing.

A A

C C

LNM

OQP

LNMOQP

a a

a aa a a a11 12

21 2211 22 12 21

1 1

1 43

,

Page 22: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.22

Université d’Ottawa / University of Ottawa

Matrix inversion and the determinant: Matrix inversion and the determinant: 2 X 2 matrices2 X 2 matricesMatrix inversion and the determinant: Matrix inversion and the determinant: 2 X 2 matrices2 X 2 matrices

If a 2 X 2 matrix A is invertible, the elements of its inverse A-1 are obtained by dividing modified elements of A by |A|

Hence, if |A| = 0, the division is undefined and the matrix is non-invertible or singular.

If a 2 X 2 matrix A is invertible, the elements of its inverse A-1 are obtained by dividing modified elements of A by |A|

Hence, if |A| = 0, the division is undefined and the matrix is non-invertible or singular.

A A

AA

D D

D

LNM

OQP

LNM

OQP

LNMOQP

L

NMMM

O

QPPP

a a

a aa a a a

a a

a a

11 12

21 2211 22 21 12

1 22 12

21 11

1

1

4 2

2 64 6 2 2 20

6

20

2

202

20

4

20

,

, ,

Page 23: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.23

Université d’Ottawa / University of Ottawa

Multivariate variance: a geometric Multivariate variance: a geometric interpretationinterpretationMultivariate variance: a geometric Multivariate variance: a geometric interpretationinterpretation

Univariate variance is a measure of the “volume” occupied by sample points in one dimension.

Multivariate variance involving m variables is the volume occupied by sample points in an m -dimensional space.

Univariate variance is a measure of the “volume” occupied by sample points in one dimension.

Multivariate variance involving m variables is the volume occupied by sample points in an m -dimensional space.

X X

Largervariance

Smallervariance

X1

X2Occupiedvolume

Page 24: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.24

Université d’Ottawa / University of Ottawa

Multivariate variance: Multivariate variance: effects of correlations effects of correlations among variablesamong variables

Multivariate variance: Multivariate variance: effects of correlations effects of correlations among variablesamong variables

Correlations between pairs of variables reduce the volume occupied by sample points…

…and hence, reduce the multivariate variance.

Correlations between pairs of variables reduce the volume occupied by sample points…

…and hence, reduce the multivariate variance.

No correlation

X1

X2

X2

X1

Positivecorrelation

Negativecorrelation

Occupiedvolume

Page 25: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.25

Université d’Ottawa / University of Ottawa

C and the generalized C and the generalized multivariate variancemultivariate varianceC and the generalized C and the generalized multivariate variancemultivariate variance The determinant of the

sample covariance matrix C is a generalized multivariate variance…

… because area2 of a parallelogram with sides given by the individual standard deviations and angle determined by the correlation between variables equals the determinant of C.

The determinant of the sample covariance matrix C is a generalized multivariate variance…

… because area2 of a parallelogram with sides given by the individual standard deviations and angle determined by the correlation between variables equals the determinant of C.

rc

s s1212

1 2

05 60 . cos , o

C CLNMOQP

1 1

1 43

2s

1s

h

2

sin 60 ; 3.2

3,

opposite hh

hypotenuse

Area Base Height Area

C

Page 26: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.26

Université d’Ottawa / University of Ottawa

The use of The use of determinants in determinants in

multivariate analysismultivariate analysis

The use of The use of determinants in determinants in

multivariate analysismultivariate analysis For a univariate sample

variance sa2, the

multivariate analog is the determinant of the corresponding sample covariance matrix Ca, i.e., | Ca|…

… and these variances are often used in the calculation of multivariate test statistics, e.g., Wilk’s .

For a univariate sample variance sa

2, the multivariate analog is the determinant of the corresponding sample covariance matrix Ca, i.e., | Ca|…

… and these variances are often used in the calculation of multivariate test statistics, e.g., Wilk’s .

VariationSource

MS Teststatistic

Groups SSg/ k-1

Error SSe/ N-k F = MSg/MSe

Total SST/ N-1

Univariate single-classificationANOVA, k groups

Multivariate single-classificationANOVA (MANOVA)

VariationSource

C Teststatistic

Groups |Cg|

Error |Ce| = |Cg|/|CT|

Total |CT|

Page 27: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.27

Université d’Ottawa / University of Ottawa

EigenvaluesEigenvaluesEigenvaluesEigenvalues The eigenvalues of a p X p

matrix A are the p solutions, some of which may be zero, to the equation |A - I| = 0.

The trace of a matrix is the sum of its eigenvalues…

… and the determinant of a matrix is the product of its eigenvalues.

The eigenvalues of a p X p matrix A are the p solutions, some of which may be zero, to the equation |A - I| = 0.

The trace of a matrix is the sum of its eigenvalues…

… and the determinant of a matrix is the product of its eigenvalues.

21

13A

21

13

0

0

21

13IA

055021

13 2

Page 28: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.28

Université d’Ottawa / University of Ottawa

Eigenvalues and Eigenvalues and eigenvectors Ieigenvectors I

Eigenvalues and Eigenvalues and eigenvectors Ieigenvectors I

Suppose v is a vector, and L a linear transformation. If L(v) = v, then v is an eigenvector of L associated with the eigenvalue .

e.g., if L is the reflection in the line y = mx, then is the eigenvector associated with eigenvalue 1, with -1.

Note that and are orthogonal!

Suppose v is a vector, and L a linear transformation. If L(v) = v, then v is an eigenvector of L associated with the eigenvalue .

e.g., if L is the reflection in the line y = mx, then is the eigenvector associated with eigenvalue 1, with -1.

Note that and are orthogonal!

mxy

)(L)( L

)(L

Page 29: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.29

Université d’Ottawa / University of Ottawa

Eigenvalues and Eigenvalues and eigenvectors of Ceigenvectors of CEigenvalues and Eigenvalues and eigenvectors of Ceigenvectors of C

Eigenvectors of the covariance matrix C are orthogonal directed line segments that “span” the variation in the data, and the corresponding (unsigned) eigenvalues are the length of these segments.

… so the product of the eigenvalues is the “volume” occupied by the data, i.e. the determinant of the covariance matrix.

Eigenvectors of the covariance matrix C are orthogonal directed line segments that “span” the variation in the data, and the corresponding (unsigned) eigenvalues are the length of these segments.

… so the product of the eigenvalues is the “volume” occupied by the data, i.e. the determinant of the covariance matrix.

No correlation

X1

X2

X2

X1

Positivecorrelation

Negativecorrelation

1

2

1

1

2

1

2

Page 30: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.30

Université d’Ottawa / University of Ottawa

TOTLNGTH

TO

TLN

GT

H

ALAR HEAD HUMERUS

TO

TLN

GT

H

ALA

R

ALA

R

HE

AD

HE

AD

TOTLNGTH

HU

ME

RU

S

ALAR HEAD HUMERUS

HU

ME

RU

S

Displaying multivariate data I: Displaying multivariate data I: Draftman’s plots (SPLOM)Draftman’s plots (SPLOM)

Displaying multivariate data I: Displaying multivariate data I: Draftman’s plots (SPLOM)Draftman’s plots (SPLOM)

Plot pairs of variables against one another.

Advantages: need only 2 plotting dimensions, bivariate relationships among variables is clear.

Problems: no direct information on relationships in higher than 2 dimensions, relationships between objects unclear.

Plot pairs of variables against one another.

Advantages: need only 2 plotting dimensions, bivariate relationships among variables is clear.

Problems: no direct information on relationships in higher than 2 dimensions, relationships between objects unclear.

Page 31: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.31

Université d’Ottawa / University of Ottawa

Displaying multivariate data II: Displaying multivariate data II: multiple 3-D plotsmultiple 3-D plots

Displaying multivariate data II: Displaying multivariate data II: multiple 3-D plotsmultiple 3-D plots

Plot 3 variables against one another.

Advantages: trivariate relationships among variables is clear.

Problems: no direct information on relationships in higher than 3 dimensions, relationships between objects unclear.

Plot 3 variables against one another.

Advantages: trivariate relationships among variables is clear.

Problems: no direct information on relationships in higher than 3 dimensions, relationships between objects unclear.

Page 32: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.32

Université d’Ottawa / University of Ottawa

Displaying multivariate data III: Displaying multivariate data III: plotting index variablesplotting index variables

Displaying multivariate data III: Displaying multivariate data III: plotting index variablesplotting index variables

Generate index variables that combine information from several measured variables, then plot these variables.

Advantages: 2- D plots make relationships among variables clear.

Disadvantages: relationships among objects unclear, key information may be lost in data reduction

Generate index variables that combine information from several measured variables, then plot these variables.

Advantages: 2- D plots make relationships among variables clear.

Disadvantages: relationships among objects unclear, key information may be lost in data reduction -4 -3 -2 -1 0 1 2 3

FACTOR(2)

-3

-2

-1

0

1

2

3

FAC

TO

R( 1

)

Page 33: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.33

Université d’Ottawa / University of Ottawa

Displaying Displaying multivariate data IV: multivariate data IV:

Icon plotsIcon plots

Displaying Displaying multivariate data IV: multivariate data IV:

Icon plotsIcon plots Used to visualize

relationships among objects, e.g. different canine groups.

Advantages: All variables displayed simultaneously.

Problems: order of display of variables arbitrary, and impressions may depend on order. Relationships among variables may be unclear.

Used to visualize relationships among objects, e.g. different canine groups.

Advantages: All variables displayed simultaneously.

Problems: order of display of variables arbitrary, and impressions may depend on order. Relationships among variables may be unclear.

Cuon Dingo Prehistoricdog

Chinesewolf

Goldenjackal

Moderndog

X3

X2

X1

X4

X5

X6

Page 34: Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics

2001

Bio 8100s Applied Multivariate Biostatistics L3.34

Université d’Ottawa / University of Ottawa

Displaying Displaying multivariate data V: multivariate data V:

profile plotsprofile plots

Displaying Displaying multivariate data V: multivariate data V:

profile plotsprofile plots

Represent objects by lines, histograms or Fourier plots.

Advantages: All variables displayed simultaneously.

Problems: order of display of variables arbitrary, and impressions may depend on order. Relationships among variables may be unclear.

Represent objects by lines, histograms or Fourier plots.

Advantages: All variables displayed simultaneously.

Problems: order of display of variables arbitrary, and impressions may depend on order. Relationships among variables may be unclear.

-180 -90 0 90 180Degrees

-2

-1

0

1

2

3

Fou

rier

Co m

pone

n ts

CuonDingoPre_dogch_wolfgold_jackalmodern_dog

GROUP$