26
Introduction to Multivariate Analysis and Multivariate Distances Hal Whitehead BIOL4062/5062

Introduction to Multivariate Analysis and Multivariate Distances

  • Upload
    rehan

  • View
    127

  • Download
    2

Embed Size (px)

DESCRIPTION

Introduction to Multivariate Analysis and Multivariate Distances. Hal Whitehead BIOL4062/5062. Data matrices Problems with data matrices missing values outliers Matrices used in multivariate analysis Multivariate distances Association matrices. The Data Matrix. Variables:. Units:. - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction to  Multivariate Analysis and Multivariate Distances

Introduction to Multivariate Analysis and

Multivariate Distances

Hal WhiteheadBIOL40625062

bull Data matricesbull Problems with data matrices

ndash missing valuesndash outliers

bull Matrices used in multivariate analysisbull Multivariate distancesbull Association matrices

The Data Matrix

A B C D E F G hellip1234567

hellip

Variables

Units

The Data Matrix

Subject Units VariablesAnimal behaviour Animals Scores on different measures

Communityecology

Plots Species counts

Palaentology Specimens Measurements on bones

Marine ecology Stations Temperature salinityspecies counts etc

Visualize Data Matrix asPoints in multidimensional space

RodentiaPrimatesMarsupialiaLagomorphaInsectivoraEdentataChiropteraCarnivora

ORDER

Problems with Data Matrix

bull Missing valuesbull Outliersbull Units not independentbull Many zerosbull Not multivariate normal

Missing DataOften present in ecological or other biological data

bull delete columns of data matrix

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrix

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrixbull just delete pairs of elements where one is missing

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrixbull just delete pairs of elements where one is missingbull interpolate

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

012

Outliersbull Statistical packages often indicate ldquooutliersrdquo

WARNING Case 86 has large leverage (Leverage = 0252)

bull If plausiblyndash the result of biological or other processes outside the scope of

the model being usedndash or the results of measurement or coding errorndash they may be discarded

bull Otherwise they should be retainedndash (perhaps use a different model)

Problems with Data Matrixbull Missing valuesbull Outliersbull Units not independent

ndash Not a problem unless doing testsbull Many zeros

ndash Special methods (eg correspondence analysis)bull Not multivariate normal

ndash Transform if possible

Uses of Multivariate Analysisbull Large data sets

ndash simplifyndash summarizendash find patterns

bull Analyze groupings of units

bull Find groupings of unitsbull Examine relationships

between variables

Some Matrices Used inMultivariate Analysis

bull Data matrix rectangularndash units i=1hellipnndash variables j k

bull Covariance matrix between variables symmetric (squaretriangular)ndash cjk= Σ (xij-xj) (xik-xk) (n-1) [xk = mean(xik)]

bull Correlation matrix between variables symmetric (squaretriangular)ndash rjk=cjk(Sj Sk) [Sk = SD(xik)]

Data MatrixLMASS LFAT LCNS LMUSCLELHEART LBONE

895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

Covariance Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

Correlation Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 1LFAT 097 1LCNS 098 094 1LMUSCLE 1 097 098 1LHEART 099 097 098 099 1LBONE 1 097 098 1 099 1

Multivariate distancesbetween units or groups of units

1 Euclidean distance

d x xij ik jk

p

=

2

1

321

SPECIES

p variables

Multivariate distancesbetween units or groups of units

2 Penrose distance

p variablesSk

2 variance of xik

321

SPECIES

P x x p Sij ik j kk

p

=

2 ( )2

1

Corrects fordifferent unitsdifferent ranges

of units ofvariables

Multivariate distancesbetween units or groups of units

3 Mahalanobis distance

p variablesvrs elements of inverse of covariance matrix

D x x v x xij ir jr rss

p

r

p

is js2

11 =

321

SPECIESCorrects forcorrelations

between variables

3 species of iris 4 measurementsbull Euclidean distances

A 0B 32 0 C 48 16 0

A B C

bull Penrose distancesA 0B 28 0 C 39 15 0

A B C

bull Mahalanobis distancesA 0B 899 0 C 1794 172 0

A B C

CBA

SPECIES

The Standard Data Matrix

A B C D E F G hellip1234567

hellip

Variables

Units

The Association Matrix

A B C D E F G hellipABCDEFGhellip

Units

Units

Association matricesbull Social structure

ndash association between individualsbull Community ecology

ndash similarity between species sitesndash dissimilarities between species sites

bull Genetic distancesbull Correlation matricesbull Covariance matricesbull Distance matrices

ndash Euclidean Penrose Mahalanobis

SimilarityDissimilarity

Association matricesDissimilaritySimilarity

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Mahalanobis distances between iris species

A 0B 899 0 C 1794 172 0

A B C

Association matricesSymmetricAsymmetric

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

Grooming ratesof capuchinmonkeys(Perry 1996)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Recipient

Actor A S N D W T

A - 58 35 21 23 004

S 416 - 286 181 90 74

N 103 255 - 96 99 43

D 233 93 105 - 134 69

W 212 152 146 251 - 104

T 25 29 37 36 53 -

  • Introduction to Multivariate Analysis and Multivariate Distances
  • Slide 2
  • The Data Matrix
  • Slide 4
  • Visualize Data Matrix as Points in multidimensional space
  • Problems with Data Matrix
  • Missing Data Often present in ecological or other biological data
  • Slide 8
  • Slide 9
  • Slide 10
  • Outliers
  • Slide 12
  • Uses of Multivariate Analysis
  • Some Matrices Used in Multivariate Analysis
  • Data Matrix
  • Covariance Matrix
  • Correlation Matrix
  • Multivariate distances between units or groups of units 1 Euclidean distance
  • Multivariate distances between units or groups of units 2 Penrose distance
  • Multivariate distances between units or groups of units 3 Mahalanobis distance
  • 3 species of iris 4 measurements
  • The Standard Data Matrix
  • The Association Matrix
  • Association matrices
  • Association matrices DissimilaritySimilarity
  • Association matrices SymmetricAsymmetric
Page 2: Introduction to  Multivariate Analysis and Multivariate Distances

bull Data matricesbull Problems with data matrices

ndash missing valuesndash outliers

bull Matrices used in multivariate analysisbull Multivariate distancesbull Association matrices

The Data Matrix

A B C D E F G hellip1234567

hellip

Variables

Units

The Data Matrix

Subject Units VariablesAnimal behaviour Animals Scores on different measures

Communityecology

Plots Species counts

Palaentology Specimens Measurements on bones

Marine ecology Stations Temperature salinityspecies counts etc

Visualize Data Matrix asPoints in multidimensional space

RodentiaPrimatesMarsupialiaLagomorphaInsectivoraEdentataChiropteraCarnivora

ORDER

Problems with Data Matrix

bull Missing valuesbull Outliersbull Units not independentbull Many zerosbull Not multivariate normal

Missing DataOften present in ecological or other biological data

bull delete columns of data matrix

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrix

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrixbull just delete pairs of elements where one is missing

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrixbull just delete pairs of elements where one is missingbull interpolate

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

012

Outliersbull Statistical packages often indicate ldquooutliersrdquo

WARNING Case 86 has large leverage (Leverage = 0252)

bull If plausiblyndash the result of biological or other processes outside the scope of

the model being usedndash or the results of measurement or coding errorndash they may be discarded

bull Otherwise they should be retainedndash (perhaps use a different model)

Problems with Data Matrixbull Missing valuesbull Outliersbull Units not independent

ndash Not a problem unless doing testsbull Many zeros

ndash Special methods (eg correspondence analysis)bull Not multivariate normal

ndash Transform if possible

Uses of Multivariate Analysisbull Large data sets

ndash simplifyndash summarizendash find patterns

bull Analyze groupings of units

bull Find groupings of unitsbull Examine relationships

between variables

Some Matrices Used inMultivariate Analysis

bull Data matrix rectangularndash units i=1hellipnndash variables j k

bull Covariance matrix between variables symmetric (squaretriangular)ndash cjk= Σ (xij-xj) (xik-xk) (n-1) [xk = mean(xik)]

bull Correlation matrix between variables symmetric (squaretriangular)ndash rjk=cjk(Sj Sk) [Sk = SD(xik)]

Data MatrixLMASS LFAT LCNS LMUSCLELHEART LBONE

895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

Covariance Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

Correlation Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 1LFAT 097 1LCNS 098 094 1LMUSCLE 1 097 098 1LHEART 099 097 098 099 1LBONE 1 097 098 1 099 1

Multivariate distancesbetween units or groups of units

1 Euclidean distance

d x xij ik jk

p

=

2

1

321

SPECIES

p variables

Multivariate distancesbetween units or groups of units

2 Penrose distance

p variablesSk

2 variance of xik

321

SPECIES

P x x p Sij ik j kk

p

=

2 ( )2

1

Corrects fordifferent unitsdifferent ranges

of units ofvariables

Multivariate distancesbetween units or groups of units

3 Mahalanobis distance

p variablesvrs elements of inverse of covariance matrix

D x x v x xij ir jr rss

p

r

p

is js2

11 =

321

SPECIESCorrects forcorrelations

between variables

3 species of iris 4 measurementsbull Euclidean distances

A 0B 32 0 C 48 16 0

A B C

bull Penrose distancesA 0B 28 0 C 39 15 0

A B C

bull Mahalanobis distancesA 0B 899 0 C 1794 172 0

A B C

CBA

SPECIES

The Standard Data Matrix

A B C D E F G hellip1234567

hellip

Variables

Units

The Association Matrix

A B C D E F G hellipABCDEFGhellip

Units

Units

Association matricesbull Social structure

ndash association between individualsbull Community ecology

ndash similarity between species sitesndash dissimilarities between species sites

bull Genetic distancesbull Correlation matricesbull Covariance matricesbull Distance matrices

ndash Euclidean Penrose Mahalanobis

SimilarityDissimilarity

Association matricesDissimilaritySimilarity

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Mahalanobis distances between iris species

A 0B 899 0 C 1794 172 0

A B C

Association matricesSymmetricAsymmetric

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

Grooming ratesof capuchinmonkeys(Perry 1996)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Recipient

Actor A S N D W T

A - 58 35 21 23 004

S 416 - 286 181 90 74

N 103 255 - 96 99 43

D 233 93 105 - 134 69

W 212 152 146 251 - 104

T 25 29 37 36 53 -

  • Introduction to Multivariate Analysis and Multivariate Distances
  • Slide 2
  • The Data Matrix
  • Slide 4
  • Visualize Data Matrix as Points in multidimensional space
  • Problems with Data Matrix
  • Missing Data Often present in ecological or other biological data
  • Slide 8
  • Slide 9
  • Slide 10
  • Outliers
  • Slide 12
  • Uses of Multivariate Analysis
  • Some Matrices Used in Multivariate Analysis
  • Data Matrix
  • Covariance Matrix
  • Correlation Matrix
  • Multivariate distances between units or groups of units 1 Euclidean distance
  • Multivariate distances between units or groups of units 2 Penrose distance
  • Multivariate distances between units or groups of units 3 Mahalanobis distance
  • 3 species of iris 4 measurements
  • The Standard Data Matrix
  • The Association Matrix
  • Association matrices
  • Association matrices DissimilaritySimilarity
  • Association matrices SymmetricAsymmetric
Page 3: Introduction to  Multivariate Analysis and Multivariate Distances

The Data Matrix

A B C D E F G hellip1234567

hellip

Variables

Units

The Data Matrix

Subject Units VariablesAnimal behaviour Animals Scores on different measures

Communityecology

Plots Species counts

Palaentology Specimens Measurements on bones

Marine ecology Stations Temperature salinityspecies counts etc

Visualize Data Matrix asPoints in multidimensional space

RodentiaPrimatesMarsupialiaLagomorphaInsectivoraEdentataChiropteraCarnivora

ORDER

Problems with Data Matrix

bull Missing valuesbull Outliersbull Units not independentbull Many zerosbull Not multivariate normal

Missing DataOften present in ecological or other biological data

bull delete columns of data matrix

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrix

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrixbull just delete pairs of elements where one is missing

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrixbull just delete pairs of elements where one is missingbull interpolate

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

012

Outliersbull Statistical packages often indicate ldquooutliersrdquo

WARNING Case 86 has large leverage (Leverage = 0252)

bull If plausiblyndash the result of biological or other processes outside the scope of

the model being usedndash or the results of measurement or coding errorndash they may be discarded

bull Otherwise they should be retainedndash (perhaps use a different model)

Problems with Data Matrixbull Missing valuesbull Outliersbull Units not independent

ndash Not a problem unless doing testsbull Many zeros

ndash Special methods (eg correspondence analysis)bull Not multivariate normal

ndash Transform if possible

Uses of Multivariate Analysisbull Large data sets

ndash simplifyndash summarizendash find patterns

bull Analyze groupings of units

bull Find groupings of unitsbull Examine relationships

between variables

Some Matrices Used inMultivariate Analysis

bull Data matrix rectangularndash units i=1hellipnndash variables j k

bull Covariance matrix between variables symmetric (squaretriangular)ndash cjk= Σ (xij-xj) (xik-xk) (n-1) [xk = mean(xik)]

bull Correlation matrix between variables symmetric (squaretriangular)ndash rjk=cjk(Sj Sk) [Sk = SD(xik)]

Data MatrixLMASS LFAT LCNS LMUSCLELHEART LBONE

895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

Covariance Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

Correlation Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 1LFAT 097 1LCNS 098 094 1LMUSCLE 1 097 098 1LHEART 099 097 098 099 1LBONE 1 097 098 1 099 1

Multivariate distancesbetween units or groups of units

1 Euclidean distance

d x xij ik jk

p

=

2

1

321

SPECIES

p variables

Multivariate distancesbetween units or groups of units

2 Penrose distance

p variablesSk

2 variance of xik

321

SPECIES

P x x p Sij ik j kk

p

=

2 ( )2

1

Corrects fordifferent unitsdifferent ranges

of units ofvariables

Multivariate distancesbetween units or groups of units

3 Mahalanobis distance

p variablesvrs elements of inverse of covariance matrix

D x x v x xij ir jr rss

p

r

p

is js2

11 =

321

SPECIESCorrects forcorrelations

between variables

3 species of iris 4 measurementsbull Euclidean distances

A 0B 32 0 C 48 16 0

A B C

bull Penrose distancesA 0B 28 0 C 39 15 0

A B C

bull Mahalanobis distancesA 0B 899 0 C 1794 172 0

A B C

CBA

SPECIES

The Standard Data Matrix

A B C D E F G hellip1234567

hellip

Variables

Units

The Association Matrix

A B C D E F G hellipABCDEFGhellip

Units

Units

Association matricesbull Social structure

ndash association between individualsbull Community ecology

ndash similarity between species sitesndash dissimilarities between species sites

bull Genetic distancesbull Correlation matricesbull Covariance matricesbull Distance matrices

ndash Euclidean Penrose Mahalanobis

SimilarityDissimilarity

Association matricesDissimilaritySimilarity

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Mahalanobis distances between iris species

A 0B 899 0 C 1794 172 0

A B C

Association matricesSymmetricAsymmetric

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

Grooming ratesof capuchinmonkeys(Perry 1996)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Recipient

Actor A S N D W T

A - 58 35 21 23 004

S 416 - 286 181 90 74

N 103 255 - 96 99 43

D 233 93 105 - 134 69

W 212 152 146 251 - 104

T 25 29 37 36 53 -

  • Introduction to Multivariate Analysis and Multivariate Distances
  • Slide 2
  • The Data Matrix
  • Slide 4
  • Visualize Data Matrix as Points in multidimensional space
  • Problems with Data Matrix
  • Missing Data Often present in ecological or other biological data
  • Slide 8
  • Slide 9
  • Slide 10
  • Outliers
  • Slide 12
  • Uses of Multivariate Analysis
  • Some Matrices Used in Multivariate Analysis
  • Data Matrix
  • Covariance Matrix
  • Correlation Matrix
  • Multivariate distances between units or groups of units 1 Euclidean distance
  • Multivariate distances between units or groups of units 2 Penrose distance
  • Multivariate distances between units or groups of units 3 Mahalanobis distance
  • 3 species of iris 4 measurements
  • The Standard Data Matrix
  • The Association Matrix
  • Association matrices
  • Association matrices DissimilaritySimilarity
  • Association matrices SymmetricAsymmetric
Page 4: Introduction to  Multivariate Analysis and Multivariate Distances

The Data Matrix

Subject Units VariablesAnimal behaviour Animals Scores on different measures

Communityecology

Plots Species counts

Palaentology Specimens Measurements on bones

Marine ecology Stations Temperature salinityspecies counts etc

Visualize Data Matrix asPoints in multidimensional space

RodentiaPrimatesMarsupialiaLagomorphaInsectivoraEdentataChiropteraCarnivora

ORDER

Problems with Data Matrix

bull Missing valuesbull Outliersbull Units not independentbull Many zerosbull Not multivariate normal

Missing DataOften present in ecological or other biological data

bull delete columns of data matrix

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrix

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrixbull just delete pairs of elements where one is missing

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrixbull just delete pairs of elements where one is missingbull interpolate

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

012

Outliersbull Statistical packages often indicate ldquooutliersrdquo

WARNING Case 86 has large leverage (Leverage = 0252)

bull If plausiblyndash the result of biological or other processes outside the scope of

the model being usedndash or the results of measurement or coding errorndash they may be discarded

bull Otherwise they should be retainedndash (perhaps use a different model)

Problems with Data Matrixbull Missing valuesbull Outliersbull Units not independent

ndash Not a problem unless doing testsbull Many zeros

ndash Special methods (eg correspondence analysis)bull Not multivariate normal

ndash Transform if possible

Uses of Multivariate Analysisbull Large data sets

ndash simplifyndash summarizendash find patterns

bull Analyze groupings of units

bull Find groupings of unitsbull Examine relationships

between variables

Some Matrices Used inMultivariate Analysis

bull Data matrix rectangularndash units i=1hellipnndash variables j k

bull Covariance matrix between variables symmetric (squaretriangular)ndash cjk= Σ (xij-xj) (xik-xk) (n-1) [xk = mean(xik)]

bull Correlation matrix between variables symmetric (squaretriangular)ndash rjk=cjk(Sj Sk) [Sk = SD(xik)]

Data MatrixLMASS LFAT LCNS LMUSCLELHEART LBONE

895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

Covariance Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

Correlation Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 1LFAT 097 1LCNS 098 094 1LMUSCLE 1 097 098 1LHEART 099 097 098 099 1LBONE 1 097 098 1 099 1

Multivariate distancesbetween units or groups of units

1 Euclidean distance

d x xij ik jk

p

=

2

1

321

SPECIES

p variables

Multivariate distancesbetween units or groups of units

2 Penrose distance

p variablesSk

2 variance of xik

321

SPECIES

P x x p Sij ik j kk

p

=

2 ( )2

1

Corrects fordifferent unitsdifferent ranges

of units ofvariables

Multivariate distancesbetween units or groups of units

3 Mahalanobis distance

p variablesvrs elements of inverse of covariance matrix

D x x v x xij ir jr rss

p

r

p

is js2

11 =

321

SPECIESCorrects forcorrelations

between variables

3 species of iris 4 measurementsbull Euclidean distances

A 0B 32 0 C 48 16 0

A B C

bull Penrose distancesA 0B 28 0 C 39 15 0

A B C

bull Mahalanobis distancesA 0B 899 0 C 1794 172 0

A B C

CBA

SPECIES

The Standard Data Matrix

A B C D E F G hellip1234567

hellip

Variables

Units

The Association Matrix

A B C D E F G hellipABCDEFGhellip

Units

Units

Association matricesbull Social structure

ndash association between individualsbull Community ecology

ndash similarity between species sitesndash dissimilarities between species sites

bull Genetic distancesbull Correlation matricesbull Covariance matricesbull Distance matrices

ndash Euclidean Penrose Mahalanobis

SimilarityDissimilarity

Association matricesDissimilaritySimilarity

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Mahalanobis distances between iris species

A 0B 899 0 C 1794 172 0

A B C

Association matricesSymmetricAsymmetric

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

Grooming ratesof capuchinmonkeys(Perry 1996)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Recipient

Actor A S N D W T

A - 58 35 21 23 004

S 416 - 286 181 90 74

N 103 255 - 96 99 43

D 233 93 105 - 134 69

W 212 152 146 251 - 104

T 25 29 37 36 53 -

  • Introduction to Multivariate Analysis and Multivariate Distances
  • Slide 2
  • The Data Matrix
  • Slide 4
  • Visualize Data Matrix as Points in multidimensional space
  • Problems with Data Matrix
  • Missing Data Often present in ecological or other biological data
  • Slide 8
  • Slide 9
  • Slide 10
  • Outliers
  • Slide 12
  • Uses of Multivariate Analysis
  • Some Matrices Used in Multivariate Analysis
  • Data Matrix
  • Covariance Matrix
  • Correlation Matrix
  • Multivariate distances between units or groups of units 1 Euclidean distance
  • Multivariate distances between units or groups of units 2 Penrose distance
  • Multivariate distances between units or groups of units 3 Mahalanobis distance
  • 3 species of iris 4 measurements
  • The Standard Data Matrix
  • The Association Matrix
  • Association matrices
  • Association matrices DissimilaritySimilarity
  • Association matrices SymmetricAsymmetric
Page 5: Introduction to  Multivariate Analysis and Multivariate Distances

Visualize Data Matrix asPoints in multidimensional space

RodentiaPrimatesMarsupialiaLagomorphaInsectivoraEdentataChiropteraCarnivora

ORDER

Problems with Data Matrix

bull Missing valuesbull Outliersbull Units not independentbull Many zerosbull Not multivariate normal

Missing DataOften present in ecological or other biological data

bull delete columns of data matrix

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrix

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrixbull just delete pairs of elements where one is missing

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrixbull just delete pairs of elements where one is missingbull interpolate

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

012

Outliersbull Statistical packages often indicate ldquooutliersrdquo

WARNING Case 86 has large leverage (Leverage = 0252)

bull If plausiblyndash the result of biological or other processes outside the scope of

the model being usedndash or the results of measurement or coding errorndash they may be discarded

bull Otherwise they should be retainedndash (perhaps use a different model)

Problems with Data Matrixbull Missing valuesbull Outliersbull Units not independent

ndash Not a problem unless doing testsbull Many zeros

ndash Special methods (eg correspondence analysis)bull Not multivariate normal

ndash Transform if possible

Uses of Multivariate Analysisbull Large data sets

ndash simplifyndash summarizendash find patterns

bull Analyze groupings of units

bull Find groupings of unitsbull Examine relationships

between variables

Some Matrices Used inMultivariate Analysis

bull Data matrix rectangularndash units i=1hellipnndash variables j k

bull Covariance matrix between variables symmetric (squaretriangular)ndash cjk= Σ (xij-xj) (xik-xk) (n-1) [xk = mean(xik)]

bull Correlation matrix between variables symmetric (squaretriangular)ndash rjk=cjk(Sj Sk) [Sk = SD(xik)]

Data MatrixLMASS LFAT LCNS LMUSCLELHEART LBONE

895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

Covariance Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

Correlation Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 1LFAT 097 1LCNS 098 094 1LMUSCLE 1 097 098 1LHEART 099 097 098 099 1LBONE 1 097 098 1 099 1

Multivariate distancesbetween units or groups of units

1 Euclidean distance

d x xij ik jk

p

=

2

1

321

SPECIES

p variables

Multivariate distancesbetween units or groups of units

2 Penrose distance

p variablesSk

2 variance of xik

321

SPECIES

P x x p Sij ik j kk

p

=

2 ( )2

1

Corrects fordifferent unitsdifferent ranges

of units ofvariables

Multivariate distancesbetween units or groups of units

3 Mahalanobis distance

p variablesvrs elements of inverse of covariance matrix

D x x v x xij ir jr rss

p

r

p

is js2

11 =

321

SPECIESCorrects forcorrelations

between variables

3 species of iris 4 measurementsbull Euclidean distances

A 0B 32 0 C 48 16 0

A B C

bull Penrose distancesA 0B 28 0 C 39 15 0

A B C

bull Mahalanobis distancesA 0B 899 0 C 1794 172 0

A B C

CBA

SPECIES

The Standard Data Matrix

A B C D E F G hellip1234567

hellip

Variables

Units

The Association Matrix

A B C D E F G hellipABCDEFGhellip

Units

Units

Association matricesbull Social structure

ndash association between individualsbull Community ecology

ndash similarity between species sitesndash dissimilarities between species sites

bull Genetic distancesbull Correlation matricesbull Covariance matricesbull Distance matrices

ndash Euclidean Penrose Mahalanobis

SimilarityDissimilarity

Association matricesDissimilaritySimilarity

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Mahalanobis distances between iris species

A 0B 899 0 C 1794 172 0

A B C

Association matricesSymmetricAsymmetric

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

Grooming ratesof capuchinmonkeys(Perry 1996)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Recipient

Actor A S N D W T

A - 58 35 21 23 004

S 416 - 286 181 90 74

N 103 255 - 96 99 43

D 233 93 105 - 134 69

W 212 152 146 251 - 104

T 25 29 37 36 53 -

  • Introduction to Multivariate Analysis and Multivariate Distances
  • Slide 2
  • The Data Matrix
  • Slide 4
  • Visualize Data Matrix as Points in multidimensional space
  • Problems with Data Matrix
  • Missing Data Often present in ecological or other biological data
  • Slide 8
  • Slide 9
  • Slide 10
  • Outliers
  • Slide 12
  • Uses of Multivariate Analysis
  • Some Matrices Used in Multivariate Analysis
  • Data Matrix
  • Covariance Matrix
  • Correlation Matrix
  • Multivariate distances between units or groups of units 1 Euclidean distance
  • Multivariate distances between units or groups of units 2 Penrose distance
  • Multivariate distances between units or groups of units 3 Mahalanobis distance
  • 3 species of iris 4 measurements
  • The Standard Data Matrix
  • The Association Matrix
  • Association matrices
  • Association matrices DissimilaritySimilarity
  • Association matrices SymmetricAsymmetric
Page 6: Introduction to  Multivariate Analysis and Multivariate Distances

Problems with Data Matrix

bull Missing valuesbull Outliersbull Units not independentbull Many zerosbull Not multivariate normal

Missing DataOften present in ecological or other biological data

bull delete columns of data matrix

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrix

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrixbull just delete pairs of elements where one is missing

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrixbull just delete pairs of elements where one is missingbull interpolate

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

012

Outliersbull Statistical packages often indicate ldquooutliersrdquo

WARNING Case 86 has large leverage (Leverage = 0252)

bull If plausiblyndash the result of biological or other processes outside the scope of

the model being usedndash or the results of measurement or coding errorndash they may be discarded

bull Otherwise they should be retainedndash (perhaps use a different model)

Problems with Data Matrixbull Missing valuesbull Outliersbull Units not independent

ndash Not a problem unless doing testsbull Many zeros

ndash Special methods (eg correspondence analysis)bull Not multivariate normal

ndash Transform if possible

Uses of Multivariate Analysisbull Large data sets

ndash simplifyndash summarizendash find patterns

bull Analyze groupings of units

bull Find groupings of unitsbull Examine relationships

between variables

Some Matrices Used inMultivariate Analysis

bull Data matrix rectangularndash units i=1hellipnndash variables j k

bull Covariance matrix between variables symmetric (squaretriangular)ndash cjk= Σ (xij-xj) (xik-xk) (n-1) [xk = mean(xik)]

bull Correlation matrix between variables symmetric (squaretriangular)ndash rjk=cjk(Sj Sk) [Sk = SD(xik)]

Data MatrixLMASS LFAT LCNS LMUSCLELHEART LBONE

895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

Covariance Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

Correlation Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 1LFAT 097 1LCNS 098 094 1LMUSCLE 1 097 098 1LHEART 099 097 098 099 1LBONE 1 097 098 1 099 1

Multivariate distancesbetween units or groups of units

1 Euclidean distance

d x xij ik jk

p

=

2

1

321

SPECIES

p variables

Multivariate distancesbetween units or groups of units

2 Penrose distance

p variablesSk

2 variance of xik

321

SPECIES

P x x p Sij ik j kk

p

=

2 ( )2

1

Corrects fordifferent unitsdifferent ranges

of units ofvariables

Multivariate distancesbetween units or groups of units

3 Mahalanobis distance

p variablesvrs elements of inverse of covariance matrix

D x x v x xij ir jr rss

p

r

p

is js2

11 =

321

SPECIESCorrects forcorrelations

between variables

3 species of iris 4 measurementsbull Euclidean distances

A 0B 32 0 C 48 16 0

A B C

bull Penrose distancesA 0B 28 0 C 39 15 0

A B C

bull Mahalanobis distancesA 0B 899 0 C 1794 172 0

A B C

CBA

SPECIES

The Standard Data Matrix

A B C D E F G hellip1234567

hellip

Variables

Units

The Association Matrix

A B C D E F G hellipABCDEFGhellip

Units

Units

Association matricesbull Social structure

ndash association between individualsbull Community ecology

ndash similarity between species sitesndash dissimilarities between species sites

bull Genetic distancesbull Correlation matricesbull Covariance matricesbull Distance matrices

ndash Euclidean Penrose Mahalanobis

SimilarityDissimilarity

Association matricesDissimilaritySimilarity

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Mahalanobis distances between iris species

A 0B 899 0 C 1794 172 0

A B C

Association matricesSymmetricAsymmetric

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

Grooming ratesof capuchinmonkeys(Perry 1996)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Recipient

Actor A S N D W T

A - 58 35 21 23 004

S 416 - 286 181 90 74

N 103 255 - 96 99 43

D 233 93 105 - 134 69

W 212 152 146 251 - 104

T 25 29 37 36 53 -

  • Introduction to Multivariate Analysis and Multivariate Distances
  • Slide 2
  • The Data Matrix
  • Slide 4
  • Visualize Data Matrix as Points in multidimensional space
  • Problems with Data Matrix
  • Missing Data Often present in ecological or other biological data
  • Slide 8
  • Slide 9
  • Slide 10
  • Outliers
  • Slide 12
  • Uses of Multivariate Analysis
  • Some Matrices Used in Multivariate Analysis
  • Data Matrix
  • Covariance Matrix
  • Correlation Matrix
  • Multivariate distances between units or groups of units 1 Euclidean distance
  • Multivariate distances between units or groups of units 2 Penrose distance
  • Multivariate distances between units or groups of units 3 Mahalanobis distance
  • 3 species of iris 4 measurements
  • The Standard Data Matrix
  • The Association Matrix
  • Association matrices
  • Association matrices DissimilaritySimilarity
  • Association matrices SymmetricAsymmetric
Page 7: Introduction to  Multivariate Analysis and Multivariate Distances

Missing DataOften present in ecological or other biological data

bull delete columns of data matrix

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrix

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrixbull just delete pairs of elements where one is missing

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrixbull just delete pairs of elements where one is missingbull interpolate

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

012

Outliersbull Statistical packages often indicate ldquooutliersrdquo

WARNING Case 86 has large leverage (Leverage = 0252)

bull If plausiblyndash the result of biological or other processes outside the scope of

the model being usedndash or the results of measurement or coding errorndash they may be discarded

bull Otherwise they should be retainedndash (perhaps use a different model)

Problems with Data Matrixbull Missing valuesbull Outliersbull Units not independent

ndash Not a problem unless doing testsbull Many zeros

ndash Special methods (eg correspondence analysis)bull Not multivariate normal

ndash Transform if possible

Uses of Multivariate Analysisbull Large data sets

ndash simplifyndash summarizendash find patterns

bull Analyze groupings of units

bull Find groupings of unitsbull Examine relationships

between variables

Some Matrices Used inMultivariate Analysis

bull Data matrix rectangularndash units i=1hellipnndash variables j k

bull Covariance matrix between variables symmetric (squaretriangular)ndash cjk= Σ (xij-xj) (xik-xk) (n-1) [xk = mean(xik)]

bull Correlation matrix between variables symmetric (squaretriangular)ndash rjk=cjk(Sj Sk) [Sk = SD(xik)]

Data MatrixLMASS LFAT LCNS LMUSCLELHEART LBONE

895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

Covariance Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

Correlation Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 1LFAT 097 1LCNS 098 094 1LMUSCLE 1 097 098 1LHEART 099 097 098 099 1LBONE 1 097 098 1 099 1

Multivariate distancesbetween units or groups of units

1 Euclidean distance

d x xij ik jk

p

=

2

1

321

SPECIES

p variables

Multivariate distancesbetween units or groups of units

2 Penrose distance

p variablesSk

2 variance of xik

321

SPECIES

P x x p Sij ik j kk

p

=

2 ( )2

1

Corrects fordifferent unitsdifferent ranges

of units ofvariables

Multivariate distancesbetween units or groups of units

3 Mahalanobis distance

p variablesvrs elements of inverse of covariance matrix

D x x v x xij ir jr rss

p

r

p

is js2

11 =

321

SPECIESCorrects forcorrelations

between variables

3 species of iris 4 measurementsbull Euclidean distances

A 0B 32 0 C 48 16 0

A B C

bull Penrose distancesA 0B 28 0 C 39 15 0

A B C

bull Mahalanobis distancesA 0B 899 0 C 1794 172 0

A B C

CBA

SPECIES

The Standard Data Matrix

A B C D E F G hellip1234567

hellip

Variables

Units

The Association Matrix

A B C D E F G hellipABCDEFGhellip

Units

Units

Association matricesbull Social structure

ndash association between individualsbull Community ecology

ndash similarity between species sitesndash dissimilarities between species sites

bull Genetic distancesbull Correlation matricesbull Covariance matricesbull Distance matrices

ndash Euclidean Penrose Mahalanobis

SimilarityDissimilarity

Association matricesDissimilaritySimilarity

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Mahalanobis distances between iris species

A 0B 899 0 C 1794 172 0

A B C

Association matricesSymmetricAsymmetric

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

Grooming ratesof capuchinmonkeys(Perry 1996)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Recipient

Actor A S N D W T

A - 58 35 21 23 004

S 416 - 286 181 90 74

N 103 255 - 96 99 43

D 233 93 105 - 134 69

W 212 152 146 251 - 104

T 25 29 37 36 53 -

  • Introduction to Multivariate Analysis and Multivariate Distances
  • Slide 2
  • The Data Matrix
  • Slide 4
  • Visualize Data Matrix as Points in multidimensional space
  • Problems with Data Matrix
  • Missing Data Often present in ecological or other biological data
  • Slide 8
  • Slide 9
  • Slide 10
  • Outliers
  • Slide 12
  • Uses of Multivariate Analysis
  • Some Matrices Used in Multivariate Analysis
  • Data Matrix
  • Covariance Matrix
  • Correlation Matrix
  • Multivariate distances between units or groups of units 1 Euclidean distance
  • Multivariate distances between units or groups of units 2 Penrose distance
  • Multivariate distances between units or groups of units 3 Mahalanobis distance
  • 3 species of iris 4 measurements
  • The Standard Data Matrix
  • The Association Matrix
  • Association matrices
  • Association matrices DissimilaritySimilarity
  • Association matrices SymmetricAsymmetric
Page 8: Introduction to  Multivariate Analysis and Multivariate Distances

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrix

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrixbull just delete pairs of elements where one is missing

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrixbull just delete pairs of elements where one is missingbull interpolate

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

012

Outliersbull Statistical packages often indicate ldquooutliersrdquo

WARNING Case 86 has large leverage (Leverage = 0252)

bull If plausiblyndash the result of biological or other processes outside the scope of

the model being usedndash or the results of measurement or coding errorndash they may be discarded

bull Otherwise they should be retainedndash (perhaps use a different model)

Problems with Data Matrixbull Missing valuesbull Outliersbull Units not independent

ndash Not a problem unless doing testsbull Many zeros

ndash Special methods (eg correspondence analysis)bull Not multivariate normal

ndash Transform if possible

Uses of Multivariate Analysisbull Large data sets

ndash simplifyndash summarizendash find patterns

bull Analyze groupings of units

bull Find groupings of unitsbull Examine relationships

between variables

Some Matrices Used inMultivariate Analysis

bull Data matrix rectangularndash units i=1hellipnndash variables j k

bull Covariance matrix between variables symmetric (squaretriangular)ndash cjk= Σ (xij-xj) (xik-xk) (n-1) [xk = mean(xik)]

bull Correlation matrix between variables symmetric (squaretriangular)ndash rjk=cjk(Sj Sk) [Sk = SD(xik)]

Data MatrixLMASS LFAT LCNS LMUSCLELHEART LBONE

895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

Covariance Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

Correlation Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 1LFAT 097 1LCNS 098 094 1LMUSCLE 1 097 098 1LHEART 099 097 098 099 1LBONE 1 097 098 1 099 1

Multivariate distancesbetween units or groups of units

1 Euclidean distance

d x xij ik jk

p

=

2

1

321

SPECIES

p variables

Multivariate distancesbetween units or groups of units

2 Penrose distance

p variablesSk

2 variance of xik

321

SPECIES

P x x p Sij ik j kk

p

=

2 ( )2

1

Corrects fordifferent unitsdifferent ranges

of units ofvariables

Multivariate distancesbetween units or groups of units

3 Mahalanobis distance

p variablesvrs elements of inverse of covariance matrix

D x x v x xij ir jr rss

p

r

p

is js2

11 =

321

SPECIESCorrects forcorrelations

between variables

3 species of iris 4 measurementsbull Euclidean distances

A 0B 32 0 C 48 16 0

A B C

bull Penrose distancesA 0B 28 0 C 39 15 0

A B C

bull Mahalanobis distancesA 0B 899 0 C 1794 172 0

A B C

CBA

SPECIES

The Standard Data Matrix

A B C D E F G hellip1234567

hellip

Variables

Units

The Association Matrix

A B C D E F G hellipABCDEFGhellip

Units

Units

Association matricesbull Social structure

ndash association between individualsbull Community ecology

ndash similarity between species sitesndash dissimilarities between species sites

bull Genetic distancesbull Correlation matricesbull Covariance matricesbull Distance matrices

ndash Euclidean Penrose Mahalanobis

SimilarityDissimilarity

Association matricesDissimilaritySimilarity

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Mahalanobis distances between iris species

A 0B 899 0 C 1794 172 0

A B C

Association matricesSymmetricAsymmetric

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

Grooming ratesof capuchinmonkeys(Perry 1996)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Recipient

Actor A S N D W T

A - 58 35 21 23 004

S 416 - 286 181 90 74

N 103 255 - 96 99 43

D 233 93 105 - 134 69

W 212 152 146 251 - 104

T 25 29 37 36 53 -

  • Introduction to Multivariate Analysis and Multivariate Distances
  • Slide 2
  • The Data Matrix
  • Slide 4
  • Visualize Data Matrix as Points in multidimensional space
  • Problems with Data Matrix
  • Missing Data Often present in ecological or other biological data
  • Slide 8
  • Slide 9
  • Slide 10
  • Outliers
  • Slide 12
  • Uses of Multivariate Analysis
  • Some Matrices Used in Multivariate Analysis
  • Data Matrix
  • Covariance Matrix
  • Correlation Matrix
  • Multivariate distances between units or groups of units 1 Euclidean distance
  • Multivariate distances between units or groups of units 2 Penrose distance
  • Multivariate distances between units or groups of units 3 Mahalanobis distance
  • 3 species of iris 4 measurements
  • The Standard Data Matrix
  • The Association Matrix
  • Association matrices
  • Association matrices DissimilaritySimilarity
  • Association matrices SymmetricAsymmetric
Page 9: Introduction to  Multivariate Analysis and Multivariate Distances

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrixbull just delete pairs of elements where one is missing

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrixbull just delete pairs of elements where one is missingbull interpolate

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

012

Outliersbull Statistical packages often indicate ldquooutliersrdquo

WARNING Case 86 has large leverage (Leverage = 0252)

bull If plausiblyndash the result of biological or other processes outside the scope of

the model being usedndash or the results of measurement or coding errorndash they may be discarded

bull Otherwise they should be retainedndash (perhaps use a different model)

Problems with Data Matrixbull Missing valuesbull Outliersbull Units not independent

ndash Not a problem unless doing testsbull Many zeros

ndash Special methods (eg correspondence analysis)bull Not multivariate normal

ndash Transform if possible

Uses of Multivariate Analysisbull Large data sets

ndash simplifyndash summarizendash find patterns

bull Analyze groupings of units

bull Find groupings of unitsbull Examine relationships

between variables

Some Matrices Used inMultivariate Analysis

bull Data matrix rectangularndash units i=1hellipnndash variables j k

bull Covariance matrix between variables symmetric (squaretriangular)ndash cjk= Σ (xij-xj) (xik-xk) (n-1) [xk = mean(xik)]

bull Correlation matrix between variables symmetric (squaretriangular)ndash rjk=cjk(Sj Sk) [Sk = SD(xik)]

Data MatrixLMASS LFAT LCNS LMUSCLELHEART LBONE

895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

Covariance Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

Correlation Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 1LFAT 097 1LCNS 098 094 1LMUSCLE 1 097 098 1LHEART 099 097 098 099 1LBONE 1 097 098 1 099 1

Multivariate distancesbetween units or groups of units

1 Euclidean distance

d x xij ik jk

p

=

2

1

321

SPECIES

p variables

Multivariate distancesbetween units or groups of units

2 Penrose distance

p variablesSk

2 variance of xik

321

SPECIES

P x x p Sij ik j kk

p

=

2 ( )2

1

Corrects fordifferent unitsdifferent ranges

of units ofvariables

Multivariate distancesbetween units or groups of units

3 Mahalanobis distance

p variablesvrs elements of inverse of covariance matrix

D x x v x xij ir jr rss

p

r

p

is js2

11 =

321

SPECIESCorrects forcorrelations

between variables

3 species of iris 4 measurementsbull Euclidean distances

A 0B 32 0 C 48 16 0

A B C

bull Penrose distancesA 0B 28 0 C 39 15 0

A B C

bull Mahalanobis distancesA 0B 899 0 C 1794 172 0

A B C

CBA

SPECIES

The Standard Data Matrix

A B C D E F G hellip1234567

hellip

Variables

Units

The Association Matrix

A B C D E F G hellipABCDEFGhellip

Units

Units

Association matricesbull Social structure

ndash association between individualsbull Community ecology

ndash similarity between species sitesndash dissimilarities between species sites

bull Genetic distancesbull Correlation matricesbull Covariance matricesbull Distance matrices

ndash Euclidean Penrose Mahalanobis

SimilarityDissimilarity

Association matricesDissimilaritySimilarity

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Mahalanobis distances between iris species

A 0B 899 0 C 1794 172 0

A B C

Association matricesSymmetricAsymmetric

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

Grooming ratesof capuchinmonkeys(Perry 1996)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Recipient

Actor A S N D W T

A - 58 35 21 23 004

S 416 - 286 181 90 74

N 103 255 - 96 99 43

D 233 93 105 - 134 69

W 212 152 146 251 - 104

T 25 29 37 36 53 -

  • Introduction to Multivariate Analysis and Multivariate Distances
  • Slide 2
  • The Data Matrix
  • Slide 4
  • Visualize Data Matrix as Points in multidimensional space
  • Problems with Data Matrix
  • Missing Data Often present in ecological or other biological data
  • Slide 8
  • Slide 9
  • Slide 10
  • Outliers
  • Slide 12
  • Uses of Multivariate Analysis
  • Some Matrices Used in Multivariate Analysis
  • Data Matrix
  • Covariance Matrix
  • Correlation Matrix
  • Multivariate distances between units or groups of units 1 Euclidean distance
  • Multivariate distances between units or groups of units 2 Penrose distance
  • Multivariate distances between units or groups of units 3 Mahalanobis distance
  • 3 species of iris 4 measurements
  • The Standard Data Matrix
  • The Association Matrix
  • Association matrices
  • Association matrices DissimilaritySimilarity
  • Association matrices SymmetricAsymmetric
Page 10: Introduction to  Multivariate Analysis and Multivariate Distances

Missing DataOften present in ecological or other biological data

bull delete columns of data matrixbull delete rows of data matrixbull just delete pairs of elements where one is missingbull interpolate

Date Year Mon Area Clan Shitr M24 M12 M3 Clan Area$23-Feb-1985 1985 2 1 1 179 Reg Galapagos24-Feb-1985 1985 2 1 1 010 554 547 143 Reg Galapagos25-Feb-1985 1985 2 1 1 268 183 102 Reg Galapagos07-Mar-1985 1985 3 1 1 014 386 261 109 Reg Galapagos08-Mar-1985 1985 3 1 1 553 318 84 Reg Galapagos

012

Outliersbull Statistical packages often indicate ldquooutliersrdquo

WARNING Case 86 has large leverage (Leverage = 0252)

bull If plausiblyndash the result of biological or other processes outside the scope of

the model being usedndash or the results of measurement or coding errorndash they may be discarded

bull Otherwise they should be retainedndash (perhaps use a different model)

Problems with Data Matrixbull Missing valuesbull Outliersbull Units not independent

ndash Not a problem unless doing testsbull Many zeros

ndash Special methods (eg correspondence analysis)bull Not multivariate normal

ndash Transform if possible

Uses of Multivariate Analysisbull Large data sets

ndash simplifyndash summarizendash find patterns

bull Analyze groupings of units

bull Find groupings of unitsbull Examine relationships

between variables

Some Matrices Used inMultivariate Analysis

bull Data matrix rectangularndash units i=1hellipnndash variables j k

bull Covariance matrix between variables symmetric (squaretriangular)ndash cjk= Σ (xij-xj) (xik-xk) (n-1) [xk = mean(xik)]

bull Correlation matrix between variables symmetric (squaretriangular)ndash rjk=cjk(Sj Sk) [Sk = SD(xik)]

Data MatrixLMASS LFAT LCNS LMUSCLELHEART LBONE

895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

Covariance Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

Correlation Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 1LFAT 097 1LCNS 098 094 1LMUSCLE 1 097 098 1LHEART 099 097 098 099 1LBONE 1 097 098 1 099 1

Multivariate distancesbetween units or groups of units

1 Euclidean distance

d x xij ik jk

p

=

2

1

321

SPECIES

p variables

Multivariate distancesbetween units or groups of units

2 Penrose distance

p variablesSk

2 variance of xik

321

SPECIES

P x x p Sij ik j kk

p

=

2 ( )2

1

Corrects fordifferent unitsdifferent ranges

of units ofvariables

Multivariate distancesbetween units or groups of units

3 Mahalanobis distance

p variablesvrs elements of inverse of covariance matrix

D x x v x xij ir jr rss

p

r

p

is js2

11 =

321

SPECIESCorrects forcorrelations

between variables

3 species of iris 4 measurementsbull Euclidean distances

A 0B 32 0 C 48 16 0

A B C

bull Penrose distancesA 0B 28 0 C 39 15 0

A B C

bull Mahalanobis distancesA 0B 899 0 C 1794 172 0

A B C

CBA

SPECIES

The Standard Data Matrix

A B C D E F G hellip1234567

hellip

Variables

Units

The Association Matrix

A B C D E F G hellipABCDEFGhellip

Units

Units

Association matricesbull Social structure

ndash association between individualsbull Community ecology

ndash similarity between species sitesndash dissimilarities between species sites

bull Genetic distancesbull Correlation matricesbull Covariance matricesbull Distance matrices

ndash Euclidean Penrose Mahalanobis

SimilarityDissimilarity

Association matricesDissimilaritySimilarity

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Mahalanobis distances between iris species

A 0B 899 0 C 1794 172 0

A B C

Association matricesSymmetricAsymmetric

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

Grooming ratesof capuchinmonkeys(Perry 1996)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Recipient

Actor A S N D W T

A - 58 35 21 23 004

S 416 - 286 181 90 74

N 103 255 - 96 99 43

D 233 93 105 - 134 69

W 212 152 146 251 - 104

T 25 29 37 36 53 -

  • Introduction to Multivariate Analysis and Multivariate Distances
  • Slide 2
  • The Data Matrix
  • Slide 4
  • Visualize Data Matrix as Points in multidimensional space
  • Problems with Data Matrix
  • Missing Data Often present in ecological or other biological data
  • Slide 8
  • Slide 9
  • Slide 10
  • Outliers
  • Slide 12
  • Uses of Multivariate Analysis
  • Some Matrices Used in Multivariate Analysis
  • Data Matrix
  • Covariance Matrix
  • Correlation Matrix
  • Multivariate distances between units or groups of units 1 Euclidean distance
  • Multivariate distances between units or groups of units 2 Penrose distance
  • Multivariate distances between units or groups of units 3 Mahalanobis distance
  • 3 species of iris 4 measurements
  • The Standard Data Matrix
  • The Association Matrix
  • Association matrices
  • Association matrices DissimilaritySimilarity
  • Association matrices SymmetricAsymmetric
Page 11: Introduction to  Multivariate Analysis and Multivariate Distances

Outliersbull Statistical packages often indicate ldquooutliersrdquo

WARNING Case 86 has large leverage (Leverage = 0252)

bull If plausiblyndash the result of biological or other processes outside the scope of

the model being usedndash or the results of measurement or coding errorndash they may be discarded

bull Otherwise they should be retainedndash (perhaps use a different model)

Problems with Data Matrixbull Missing valuesbull Outliersbull Units not independent

ndash Not a problem unless doing testsbull Many zeros

ndash Special methods (eg correspondence analysis)bull Not multivariate normal

ndash Transform if possible

Uses of Multivariate Analysisbull Large data sets

ndash simplifyndash summarizendash find patterns

bull Analyze groupings of units

bull Find groupings of unitsbull Examine relationships

between variables

Some Matrices Used inMultivariate Analysis

bull Data matrix rectangularndash units i=1hellipnndash variables j k

bull Covariance matrix between variables symmetric (squaretriangular)ndash cjk= Σ (xij-xj) (xik-xk) (n-1) [xk = mean(xik)]

bull Correlation matrix between variables symmetric (squaretriangular)ndash rjk=cjk(Sj Sk) [Sk = SD(xik)]

Data MatrixLMASS LFAT LCNS LMUSCLELHEART LBONE

895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

Covariance Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

Correlation Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 1LFAT 097 1LCNS 098 094 1LMUSCLE 1 097 098 1LHEART 099 097 098 099 1LBONE 1 097 098 1 099 1

Multivariate distancesbetween units or groups of units

1 Euclidean distance

d x xij ik jk

p

=

2

1

321

SPECIES

p variables

Multivariate distancesbetween units or groups of units

2 Penrose distance

p variablesSk

2 variance of xik

321

SPECIES

P x x p Sij ik j kk

p

=

2 ( )2

1

Corrects fordifferent unitsdifferent ranges

of units ofvariables

Multivariate distancesbetween units or groups of units

3 Mahalanobis distance

p variablesvrs elements of inverse of covariance matrix

D x x v x xij ir jr rss

p

r

p

is js2

11 =

321

SPECIESCorrects forcorrelations

between variables

3 species of iris 4 measurementsbull Euclidean distances

A 0B 32 0 C 48 16 0

A B C

bull Penrose distancesA 0B 28 0 C 39 15 0

A B C

bull Mahalanobis distancesA 0B 899 0 C 1794 172 0

A B C

CBA

SPECIES

The Standard Data Matrix

A B C D E F G hellip1234567

hellip

Variables

Units

The Association Matrix

A B C D E F G hellipABCDEFGhellip

Units

Units

Association matricesbull Social structure

ndash association between individualsbull Community ecology

ndash similarity between species sitesndash dissimilarities between species sites

bull Genetic distancesbull Correlation matricesbull Covariance matricesbull Distance matrices

ndash Euclidean Penrose Mahalanobis

SimilarityDissimilarity

Association matricesDissimilaritySimilarity

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Mahalanobis distances between iris species

A 0B 899 0 C 1794 172 0

A B C

Association matricesSymmetricAsymmetric

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

Grooming ratesof capuchinmonkeys(Perry 1996)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Recipient

Actor A S N D W T

A - 58 35 21 23 004

S 416 - 286 181 90 74

N 103 255 - 96 99 43

D 233 93 105 - 134 69

W 212 152 146 251 - 104

T 25 29 37 36 53 -

  • Introduction to Multivariate Analysis and Multivariate Distances
  • Slide 2
  • The Data Matrix
  • Slide 4
  • Visualize Data Matrix as Points in multidimensional space
  • Problems with Data Matrix
  • Missing Data Often present in ecological or other biological data
  • Slide 8
  • Slide 9
  • Slide 10
  • Outliers
  • Slide 12
  • Uses of Multivariate Analysis
  • Some Matrices Used in Multivariate Analysis
  • Data Matrix
  • Covariance Matrix
  • Correlation Matrix
  • Multivariate distances between units or groups of units 1 Euclidean distance
  • Multivariate distances between units or groups of units 2 Penrose distance
  • Multivariate distances between units or groups of units 3 Mahalanobis distance
  • 3 species of iris 4 measurements
  • The Standard Data Matrix
  • The Association Matrix
  • Association matrices
  • Association matrices DissimilaritySimilarity
  • Association matrices SymmetricAsymmetric
Page 12: Introduction to  Multivariate Analysis and Multivariate Distances

Problems with Data Matrixbull Missing valuesbull Outliersbull Units not independent

ndash Not a problem unless doing testsbull Many zeros

ndash Special methods (eg correspondence analysis)bull Not multivariate normal

ndash Transform if possible

Uses of Multivariate Analysisbull Large data sets

ndash simplifyndash summarizendash find patterns

bull Analyze groupings of units

bull Find groupings of unitsbull Examine relationships

between variables

Some Matrices Used inMultivariate Analysis

bull Data matrix rectangularndash units i=1hellipnndash variables j k

bull Covariance matrix between variables symmetric (squaretriangular)ndash cjk= Σ (xij-xj) (xik-xk) (n-1) [xk = mean(xik)]

bull Correlation matrix between variables symmetric (squaretriangular)ndash rjk=cjk(Sj Sk) [Sk = SD(xik)]

Data MatrixLMASS LFAT LCNS LMUSCLELHEART LBONE

895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

Covariance Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

Correlation Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 1LFAT 097 1LCNS 098 094 1LMUSCLE 1 097 098 1LHEART 099 097 098 099 1LBONE 1 097 098 1 099 1

Multivariate distancesbetween units or groups of units

1 Euclidean distance

d x xij ik jk

p

=

2

1

321

SPECIES

p variables

Multivariate distancesbetween units or groups of units

2 Penrose distance

p variablesSk

2 variance of xik

321

SPECIES

P x x p Sij ik j kk

p

=

2 ( )2

1

Corrects fordifferent unitsdifferent ranges

of units ofvariables

Multivariate distancesbetween units or groups of units

3 Mahalanobis distance

p variablesvrs elements of inverse of covariance matrix

D x x v x xij ir jr rss

p

r

p

is js2

11 =

321

SPECIESCorrects forcorrelations

between variables

3 species of iris 4 measurementsbull Euclidean distances

A 0B 32 0 C 48 16 0

A B C

bull Penrose distancesA 0B 28 0 C 39 15 0

A B C

bull Mahalanobis distancesA 0B 899 0 C 1794 172 0

A B C

CBA

SPECIES

The Standard Data Matrix

A B C D E F G hellip1234567

hellip

Variables

Units

The Association Matrix

A B C D E F G hellipABCDEFGhellip

Units

Units

Association matricesbull Social structure

ndash association between individualsbull Community ecology

ndash similarity between species sitesndash dissimilarities between species sites

bull Genetic distancesbull Correlation matricesbull Covariance matricesbull Distance matrices

ndash Euclidean Penrose Mahalanobis

SimilarityDissimilarity

Association matricesDissimilaritySimilarity

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Mahalanobis distances between iris species

A 0B 899 0 C 1794 172 0

A B C

Association matricesSymmetricAsymmetric

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

Grooming ratesof capuchinmonkeys(Perry 1996)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Recipient

Actor A S N D W T

A - 58 35 21 23 004

S 416 - 286 181 90 74

N 103 255 - 96 99 43

D 233 93 105 - 134 69

W 212 152 146 251 - 104

T 25 29 37 36 53 -

  • Introduction to Multivariate Analysis and Multivariate Distances
  • Slide 2
  • The Data Matrix
  • Slide 4
  • Visualize Data Matrix as Points in multidimensional space
  • Problems with Data Matrix
  • Missing Data Often present in ecological or other biological data
  • Slide 8
  • Slide 9
  • Slide 10
  • Outliers
  • Slide 12
  • Uses of Multivariate Analysis
  • Some Matrices Used in Multivariate Analysis
  • Data Matrix
  • Covariance Matrix
  • Correlation Matrix
  • Multivariate distances between units or groups of units 1 Euclidean distance
  • Multivariate distances between units or groups of units 2 Penrose distance
  • Multivariate distances between units or groups of units 3 Mahalanobis distance
  • 3 species of iris 4 measurements
  • The Standard Data Matrix
  • The Association Matrix
  • Association matrices
  • Association matrices DissimilaritySimilarity
  • Association matrices SymmetricAsymmetric
Page 13: Introduction to  Multivariate Analysis and Multivariate Distances

Uses of Multivariate Analysisbull Large data sets

ndash simplifyndash summarizendash find patterns

bull Analyze groupings of units

bull Find groupings of unitsbull Examine relationships

between variables

Some Matrices Used inMultivariate Analysis

bull Data matrix rectangularndash units i=1hellipnndash variables j k

bull Covariance matrix between variables symmetric (squaretriangular)ndash cjk= Σ (xij-xj) (xik-xk) (n-1) [xk = mean(xik)]

bull Correlation matrix between variables symmetric (squaretriangular)ndash rjk=cjk(Sj Sk) [Sk = SD(xik)]

Data MatrixLMASS LFAT LCNS LMUSCLELHEART LBONE

895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

Covariance Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

Correlation Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 1LFAT 097 1LCNS 098 094 1LMUSCLE 1 097 098 1LHEART 099 097 098 099 1LBONE 1 097 098 1 099 1

Multivariate distancesbetween units or groups of units

1 Euclidean distance

d x xij ik jk

p

=

2

1

321

SPECIES

p variables

Multivariate distancesbetween units or groups of units

2 Penrose distance

p variablesSk

2 variance of xik

321

SPECIES

P x x p Sij ik j kk

p

=

2 ( )2

1

Corrects fordifferent unitsdifferent ranges

of units ofvariables

Multivariate distancesbetween units or groups of units

3 Mahalanobis distance

p variablesvrs elements of inverse of covariance matrix

D x x v x xij ir jr rss

p

r

p

is js2

11 =

321

SPECIESCorrects forcorrelations

between variables

3 species of iris 4 measurementsbull Euclidean distances

A 0B 32 0 C 48 16 0

A B C

bull Penrose distancesA 0B 28 0 C 39 15 0

A B C

bull Mahalanobis distancesA 0B 899 0 C 1794 172 0

A B C

CBA

SPECIES

The Standard Data Matrix

A B C D E F G hellip1234567

hellip

Variables

Units

The Association Matrix

A B C D E F G hellipABCDEFGhellip

Units

Units

Association matricesbull Social structure

ndash association between individualsbull Community ecology

ndash similarity between species sitesndash dissimilarities between species sites

bull Genetic distancesbull Correlation matricesbull Covariance matricesbull Distance matrices

ndash Euclidean Penrose Mahalanobis

SimilarityDissimilarity

Association matricesDissimilaritySimilarity

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Mahalanobis distances between iris species

A 0B 899 0 C 1794 172 0

A B C

Association matricesSymmetricAsymmetric

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

Grooming ratesof capuchinmonkeys(Perry 1996)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Recipient

Actor A S N D W T

A - 58 35 21 23 004

S 416 - 286 181 90 74

N 103 255 - 96 99 43

D 233 93 105 - 134 69

W 212 152 146 251 - 104

T 25 29 37 36 53 -

  • Introduction to Multivariate Analysis and Multivariate Distances
  • Slide 2
  • The Data Matrix
  • Slide 4
  • Visualize Data Matrix as Points in multidimensional space
  • Problems with Data Matrix
  • Missing Data Often present in ecological or other biological data
  • Slide 8
  • Slide 9
  • Slide 10
  • Outliers
  • Slide 12
  • Uses of Multivariate Analysis
  • Some Matrices Used in Multivariate Analysis
  • Data Matrix
  • Covariance Matrix
  • Correlation Matrix
  • Multivariate distances between units or groups of units 1 Euclidean distance
  • Multivariate distances between units or groups of units 2 Penrose distance
  • Multivariate distances between units or groups of units 3 Mahalanobis distance
  • 3 species of iris 4 measurements
  • The Standard Data Matrix
  • The Association Matrix
  • Association matrices
  • Association matrices DissimilaritySimilarity
  • Association matrices SymmetricAsymmetric
Page 14: Introduction to  Multivariate Analysis and Multivariate Distances

Some Matrices Used inMultivariate Analysis

bull Data matrix rectangularndash units i=1hellipnndash variables j k

bull Covariance matrix between variables symmetric (squaretriangular)ndash cjk= Σ (xij-xj) (xik-xk) (n-1) [xk = mean(xik)]

bull Correlation matrix between variables symmetric (squaretriangular)ndash rjk=cjk(Sj Sk) [Sk = SD(xik)]

Data MatrixLMASS LFAT LCNS LMUSCLELHEART LBONE

895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

Covariance Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

Correlation Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 1LFAT 097 1LCNS 098 094 1LMUSCLE 1 097 098 1LHEART 099 097 098 099 1LBONE 1 097 098 1 099 1

Multivariate distancesbetween units or groups of units

1 Euclidean distance

d x xij ik jk

p

=

2

1

321

SPECIES

p variables

Multivariate distancesbetween units or groups of units

2 Penrose distance

p variablesSk

2 variance of xik

321

SPECIES

P x x p Sij ik j kk

p

=

2 ( )2

1

Corrects fordifferent unitsdifferent ranges

of units ofvariables

Multivariate distancesbetween units or groups of units

3 Mahalanobis distance

p variablesvrs elements of inverse of covariance matrix

D x x v x xij ir jr rss

p

r

p

is js2

11 =

321

SPECIESCorrects forcorrelations

between variables

3 species of iris 4 measurementsbull Euclidean distances

A 0B 32 0 C 48 16 0

A B C

bull Penrose distancesA 0B 28 0 C 39 15 0

A B C

bull Mahalanobis distancesA 0B 899 0 C 1794 172 0

A B C

CBA

SPECIES

The Standard Data Matrix

A B C D E F G hellip1234567

hellip

Variables

Units

The Association Matrix

A B C D E F G hellipABCDEFGhellip

Units

Units

Association matricesbull Social structure

ndash association between individualsbull Community ecology

ndash similarity between species sitesndash dissimilarities between species sites

bull Genetic distancesbull Correlation matricesbull Covariance matricesbull Distance matrices

ndash Euclidean Penrose Mahalanobis

SimilarityDissimilarity

Association matricesDissimilaritySimilarity

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Mahalanobis distances between iris species

A 0B 899 0 C 1794 172 0

A B C

Association matricesSymmetricAsymmetric

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

Grooming ratesof capuchinmonkeys(Perry 1996)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Recipient

Actor A S N D W T

A - 58 35 21 23 004

S 416 - 286 181 90 74

N 103 255 - 96 99 43

D 233 93 105 - 134 69

W 212 152 146 251 - 104

T 25 29 37 36 53 -

  • Introduction to Multivariate Analysis and Multivariate Distances
  • Slide 2
  • The Data Matrix
  • Slide 4
  • Visualize Data Matrix as Points in multidimensional space
  • Problems with Data Matrix
  • Missing Data Often present in ecological or other biological data
  • Slide 8
  • Slide 9
  • Slide 10
  • Outliers
  • Slide 12
  • Uses of Multivariate Analysis
  • Some Matrices Used in Multivariate Analysis
  • Data Matrix
  • Covariance Matrix
  • Correlation Matrix
  • Multivariate distances between units or groups of units 1 Euclidean distance
  • Multivariate distances between units or groups of units 2 Penrose distance
  • Multivariate distances between units or groups of units 3 Mahalanobis distance
  • 3 species of iris 4 measurements
  • The Standard Data Matrix
  • The Association Matrix
  • Association matrices
  • Association matrices DissimilaritySimilarity
  • Association matrices SymmetricAsymmetric
Page 15: Introduction to  Multivariate Analysis and Multivariate Distances

Data MatrixLMASS LFAT LCNS LMUSCLELHEART LBONE

895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

Covariance Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

Correlation Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 1LFAT 097 1LCNS 098 094 1LMUSCLE 1 097 098 1LHEART 099 097 098 099 1LBONE 1 097 098 1 099 1

Multivariate distancesbetween units or groups of units

1 Euclidean distance

d x xij ik jk

p

=

2

1

321

SPECIES

p variables

Multivariate distancesbetween units or groups of units

2 Penrose distance

p variablesSk

2 variance of xik

321

SPECIES

P x x p Sij ik j kk

p

=

2 ( )2

1

Corrects fordifferent unitsdifferent ranges

of units ofvariables

Multivariate distancesbetween units or groups of units

3 Mahalanobis distance

p variablesvrs elements of inverse of covariance matrix

D x x v x xij ir jr rss

p

r

p

is js2

11 =

321

SPECIESCorrects forcorrelations

between variables

3 species of iris 4 measurementsbull Euclidean distances

A 0B 32 0 C 48 16 0

A B C

bull Penrose distancesA 0B 28 0 C 39 15 0

A B C

bull Mahalanobis distancesA 0B 899 0 C 1794 172 0

A B C

CBA

SPECIES

The Standard Data Matrix

A B C D E F G hellip1234567

hellip

Variables

Units

The Association Matrix

A B C D E F G hellipABCDEFGhellip

Units

Units

Association matricesbull Social structure

ndash association between individualsbull Community ecology

ndash similarity between species sitesndash dissimilarities between species sites

bull Genetic distancesbull Correlation matricesbull Covariance matricesbull Distance matrices

ndash Euclidean Penrose Mahalanobis

SimilarityDissimilarity

Association matricesDissimilaritySimilarity

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Mahalanobis distances between iris species

A 0B 899 0 C 1794 172 0

A B C

Association matricesSymmetricAsymmetric

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

Grooming ratesof capuchinmonkeys(Perry 1996)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Recipient

Actor A S N D W T

A - 58 35 21 23 004

S 416 - 286 181 90 74

N 103 255 - 96 99 43

D 233 93 105 - 134 69

W 212 152 146 251 - 104

T 25 29 37 36 53 -

  • Introduction to Multivariate Analysis and Multivariate Distances
  • Slide 2
  • The Data Matrix
  • Slide 4
  • Visualize Data Matrix as Points in multidimensional space
  • Problems with Data Matrix
  • Missing Data Often present in ecological or other biological data
  • Slide 8
  • Slide 9
  • Slide 10
  • Outliers
  • Slide 12
  • Uses of Multivariate Analysis
  • Some Matrices Used in Multivariate Analysis
  • Data Matrix
  • Covariance Matrix
  • Correlation Matrix
  • Multivariate distances between units or groups of units 1 Euclidean distance
  • Multivariate distances between units or groups of units 2 Penrose distance
  • Multivariate distances between units or groups of units 3 Mahalanobis distance
  • 3 species of iris 4 measurements
  • The Standard Data Matrix
  • The Association Matrix
  • Association matrices
  • Association matrices DissimilaritySimilarity
  • Association matrices SymmetricAsymmetric
Page 16: Introduction to  Multivariate Analysis and Multivariate Distances

Covariance Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

Correlation Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 1LFAT 097 1LCNS 098 094 1LMUSCLE 1 097 098 1LHEART 099 097 098 099 1LBONE 1 097 098 1 099 1

Multivariate distancesbetween units or groups of units

1 Euclidean distance

d x xij ik jk

p

=

2

1

321

SPECIES

p variables

Multivariate distancesbetween units or groups of units

2 Penrose distance

p variablesSk

2 variance of xik

321

SPECIES

P x x p Sij ik j kk

p

=

2 ( )2

1

Corrects fordifferent unitsdifferent ranges

of units ofvariables

Multivariate distancesbetween units or groups of units

3 Mahalanobis distance

p variablesvrs elements of inverse of covariance matrix

D x x v x xij ir jr rss

p

r

p

is js2

11 =

321

SPECIESCorrects forcorrelations

between variables

3 species of iris 4 measurementsbull Euclidean distances

A 0B 32 0 C 48 16 0

A B C

bull Penrose distancesA 0B 28 0 C 39 15 0

A B C

bull Mahalanobis distancesA 0B 899 0 C 1794 172 0

A B C

CBA

SPECIES

The Standard Data Matrix

A B C D E F G hellip1234567

hellip

Variables

Units

The Association Matrix

A B C D E F G hellipABCDEFGhellip

Units

Units

Association matricesbull Social structure

ndash association between individualsbull Community ecology

ndash similarity between species sitesndash dissimilarities between species sites

bull Genetic distancesbull Correlation matricesbull Covariance matricesbull Distance matrices

ndash Euclidean Penrose Mahalanobis

SimilarityDissimilarity

Association matricesDissimilaritySimilarity

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Mahalanobis distances between iris species

A 0B 899 0 C 1794 172 0

A B C

Association matricesSymmetricAsymmetric

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

Grooming ratesof capuchinmonkeys(Perry 1996)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Recipient

Actor A S N D W T

A - 58 35 21 23 004

S 416 - 286 181 90 74

N 103 255 - 96 99 43

D 233 93 105 - 134 69

W 212 152 146 251 - 104

T 25 29 37 36 53 -

  • Introduction to Multivariate Analysis and Multivariate Distances
  • Slide 2
  • The Data Matrix
  • Slide 4
  • Visualize Data Matrix as Points in multidimensional space
  • Problems with Data Matrix
  • Missing Data Often present in ecological or other biological data
  • Slide 8
  • Slide 9
  • Slide 10
  • Outliers
  • Slide 12
  • Uses of Multivariate Analysis
  • Some Matrices Used in Multivariate Analysis
  • Data Matrix
  • Covariance Matrix
  • Correlation Matrix
  • Multivariate distances between units or groups of units 1 Euclidean distance
  • Multivariate distances between units or groups of units 2 Penrose distance
  • Multivariate distances between units or groups of units 3 Mahalanobis distance
  • 3 species of iris 4 measurements
  • The Standard Data Matrix
  • The Association Matrix
  • Association matrices
  • Association matrices DissimilaritySimilarity
  • Association matrices SymmetricAsymmetric
Page 17: Introduction to  Multivariate Analysis and Multivariate Distances

Correlation Matrix

LMASS LFAT LCNS LMUSCLELHEART LBONE895 702 465 838 332 645872 660 440 819 324 631914 633 445 857 439 678521 113 190 465 063 309694 419 289 637 203 439871 692 407 798 359 625240 -151 -105 171 -190 031370 133 -004 289 -076 150415 183 019 337 -030 209198 -139 -099 135 -230 -037

hellip hellip hellip hellip hellip hellip

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 538LFAT 649 823LCNS 415 493 333LMUSCLE 544 651 421 551LHEART 467 563 363 472 412LBONE 525 631 405 531 457 516

LMASS LFAT LCNS LMUSCLELHEART LBONELMASS 1LFAT 097 1LCNS 098 094 1LMUSCLE 1 097 098 1LHEART 099 097 098 099 1LBONE 1 097 098 1 099 1

Multivariate distancesbetween units or groups of units

1 Euclidean distance

d x xij ik jk

p

=

2

1

321

SPECIES

p variables

Multivariate distancesbetween units or groups of units

2 Penrose distance

p variablesSk

2 variance of xik

321

SPECIES

P x x p Sij ik j kk

p

=

2 ( )2

1

Corrects fordifferent unitsdifferent ranges

of units ofvariables

Multivariate distancesbetween units or groups of units

3 Mahalanobis distance

p variablesvrs elements of inverse of covariance matrix

D x x v x xij ir jr rss

p

r

p

is js2

11 =

321

SPECIESCorrects forcorrelations

between variables

3 species of iris 4 measurementsbull Euclidean distances

A 0B 32 0 C 48 16 0

A B C

bull Penrose distancesA 0B 28 0 C 39 15 0

A B C

bull Mahalanobis distancesA 0B 899 0 C 1794 172 0

A B C

CBA

SPECIES

The Standard Data Matrix

A B C D E F G hellip1234567

hellip

Variables

Units

The Association Matrix

A B C D E F G hellipABCDEFGhellip

Units

Units

Association matricesbull Social structure

ndash association between individualsbull Community ecology

ndash similarity between species sitesndash dissimilarities between species sites

bull Genetic distancesbull Correlation matricesbull Covariance matricesbull Distance matrices

ndash Euclidean Penrose Mahalanobis

SimilarityDissimilarity

Association matricesDissimilaritySimilarity

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Mahalanobis distances between iris species

A 0B 899 0 C 1794 172 0

A B C

Association matricesSymmetricAsymmetric

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

Grooming ratesof capuchinmonkeys(Perry 1996)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Recipient

Actor A S N D W T

A - 58 35 21 23 004

S 416 - 286 181 90 74

N 103 255 - 96 99 43

D 233 93 105 - 134 69

W 212 152 146 251 - 104

T 25 29 37 36 53 -

  • Introduction to Multivariate Analysis and Multivariate Distances
  • Slide 2
  • The Data Matrix
  • Slide 4
  • Visualize Data Matrix as Points in multidimensional space
  • Problems with Data Matrix
  • Missing Data Often present in ecological or other biological data
  • Slide 8
  • Slide 9
  • Slide 10
  • Outliers
  • Slide 12
  • Uses of Multivariate Analysis
  • Some Matrices Used in Multivariate Analysis
  • Data Matrix
  • Covariance Matrix
  • Correlation Matrix
  • Multivariate distances between units or groups of units 1 Euclidean distance
  • Multivariate distances between units or groups of units 2 Penrose distance
  • Multivariate distances between units or groups of units 3 Mahalanobis distance
  • 3 species of iris 4 measurements
  • The Standard Data Matrix
  • The Association Matrix
  • Association matrices
  • Association matrices DissimilaritySimilarity
  • Association matrices SymmetricAsymmetric
Page 18: Introduction to  Multivariate Analysis and Multivariate Distances

Multivariate distancesbetween units or groups of units

1 Euclidean distance

d x xij ik jk

p

=

2

1

321

SPECIES

p variables

Multivariate distancesbetween units or groups of units

2 Penrose distance

p variablesSk

2 variance of xik

321

SPECIES

P x x p Sij ik j kk

p

=

2 ( )2

1

Corrects fordifferent unitsdifferent ranges

of units ofvariables

Multivariate distancesbetween units or groups of units

3 Mahalanobis distance

p variablesvrs elements of inverse of covariance matrix

D x x v x xij ir jr rss

p

r

p

is js2

11 =

321

SPECIESCorrects forcorrelations

between variables

3 species of iris 4 measurementsbull Euclidean distances

A 0B 32 0 C 48 16 0

A B C

bull Penrose distancesA 0B 28 0 C 39 15 0

A B C

bull Mahalanobis distancesA 0B 899 0 C 1794 172 0

A B C

CBA

SPECIES

The Standard Data Matrix

A B C D E F G hellip1234567

hellip

Variables

Units

The Association Matrix

A B C D E F G hellipABCDEFGhellip

Units

Units

Association matricesbull Social structure

ndash association between individualsbull Community ecology

ndash similarity between species sitesndash dissimilarities between species sites

bull Genetic distancesbull Correlation matricesbull Covariance matricesbull Distance matrices

ndash Euclidean Penrose Mahalanobis

SimilarityDissimilarity

Association matricesDissimilaritySimilarity

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Mahalanobis distances between iris species

A 0B 899 0 C 1794 172 0

A B C

Association matricesSymmetricAsymmetric

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

Grooming ratesof capuchinmonkeys(Perry 1996)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Recipient

Actor A S N D W T

A - 58 35 21 23 004

S 416 - 286 181 90 74

N 103 255 - 96 99 43

D 233 93 105 - 134 69

W 212 152 146 251 - 104

T 25 29 37 36 53 -

  • Introduction to Multivariate Analysis and Multivariate Distances
  • Slide 2
  • The Data Matrix
  • Slide 4
  • Visualize Data Matrix as Points in multidimensional space
  • Problems with Data Matrix
  • Missing Data Often present in ecological or other biological data
  • Slide 8
  • Slide 9
  • Slide 10
  • Outliers
  • Slide 12
  • Uses of Multivariate Analysis
  • Some Matrices Used in Multivariate Analysis
  • Data Matrix
  • Covariance Matrix
  • Correlation Matrix
  • Multivariate distances between units or groups of units 1 Euclidean distance
  • Multivariate distances between units or groups of units 2 Penrose distance
  • Multivariate distances between units or groups of units 3 Mahalanobis distance
  • 3 species of iris 4 measurements
  • The Standard Data Matrix
  • The Association Matrix
  • Association matrices
  • Association matrices DissimilaritySimilarity
  • Association matrices SymmetricAsymmetric
Page 19: Introduction to  Multivariate Analysis and Multivariate Distances

Multivariate distancesbetween units or groups of units

2 Penrose distance

p variablesSk

2 variance of xik

321

SPECIES

P x x p Sij ik j kk

p

=

2 ( )2

1

Corrects fordifferent unitsdifferent ranges

of units ofvariables

Multivariate distancesbetween units or groups of units

3 Mahalanobis distance

p variablesvrs elements of inverse of covariance matrix

D x x v x xij ir jr rss

p

r

p

is js2

11 =

321

SPECIESCorrects forcorrelations

between variables

3 species of iris 4 measurementsbull Euclidean distances

A 0B 32 0 C 48 16 0

A B C

bull Penrose distancesA 0B 28 0 C 39 15 0

A B C

bull Mahalanobis distancesA 0B 899 0 C 1794 172 0

A B C

CBA

SPECIES

The Standard Data Matrix

A B C D E F G hellip1234567

hellip

Variables

Units

The Association Matrix

A B C D E F G hellipABCDEFGhellip

Units

Units

Association matricesbull Social structure

ndash association between individualsbull Community ecology

ndash similarity between species sitesndash dissimilarities between species sites

bull Genetic distancesbull Correlation matricesbull Covariance matricesbull Distance matrices

ndash Euclidean Penrose Mahalanobis

SimilarityDissimilarity

Association matricesDissimilaritySimilarity

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Mahalanobis distances between iris species

A 0B 899 0 C 1794 172 0

A B C

Association matricesSymmetricAsymmetric

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

Grooming ratesof capuchinmonkeys(Perry 1996)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Recipient

Actor A S N D W T

A - 58 35 21 23 004

S 416 - 286 181 90 74

N 103 255 - 96 99 43

D 233 93 105 - 134 69

W 212 152 146 251 - 104

T 25 29 37 36 53 -

  • Introduction to Multivariate Analysis and Multivariate Distances
  • Slide 2
  • The Data Matrix
  • Slide 4
  • Visualize Data Matrix as Points in multidimensional space
  • Problems with Data Matrix
  • Missing Data Often present in ecological or other biological data
  • Slide 8
  • Slide 9
  • Slide 10
  • Outliers
  • Slide 12
  • Uses of Multivariate Analysis
  • Some Matrices Used in Multivariate Analysis
  • Data Matrix
  • Covariance Matrix
  • Correlation Matrix
  • Multivariate distances between units or groups of units 1 Euclidean distance
  • Multivariate distances between units or groups of units 2 Penrose distance
  • Multivariate distances between units or groups of units 3 Mahalanobis distance
  • 3 species of iris 4 measurements
  • The Standard Data Matrix
  • The Association Matrix
  • Association matrices
  • Association matrices DissimilaritySimilarity
  • Association matrices SymmetricAsymmetric
Page 20: Introduction to  Multivariate Analysis and Multivariate Distances

Multivariate distancesbetween units or groups of units

3 Mahalanobis distance

p variablesvrs elements of inverse of covariance matrix

D x x v x xij ir jr rss

p

r

p

is js2

11 =

321

SPECIESCorrects forcorrelations

between variables

3 species of iris 4 measurementsbull Euclidean distances

A 0B 32 0 C 48 16 0

A B C

bull Penrose distancesA 0B 28 0 C 39 15 0

A B C

bull Mahalanobis distancesA 0B 899 0 C 1794 172 0

A B C

CBA

SPECIES

The Standard Data Matrix

A B C D E F G hellip1234567

hellip

Variables

Units

The Association Matrix

A B C D E F G hellipABCDEFGhellip

Units

Units

Association matricesbull Social structure

ndash association between individualsbull Community ecology

ndash similarity between species sitesndash dissimilarities between species sites

bull Genetic distancesbull Correlation matricesbull Covariance matricesbull Distance matrices

ndash Euclidean Penrose Mahalanobis

SimilarityDissimilarity

Association matricesDissimilaritySimilarity

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Mahalanobis distances between iris species

A 0B 899 0 C 1794 172 0

A B C

Association matricesSymmetricAsymmetric

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

Grooming ratesof capuchinmonkeys(Perry 1996)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Recipient

Actor A S N D W T

A - 58 35 21 23 004

S 416 - 286 181 90 74

N 103 255 - 96 99 43

D 233 93 105 - 134 69

W 212 152 146 251 - 104

T 25 29 37 36 53 -

  • Introduction to Multivariate Analysis and Multivariate Distances
  • Slide 2
  • The Data Matrix
  • Slide 4
  • Visualize Data Matrix as Points in multidimensional space
  • Problems with Data Matrix
  • Missing Data Often present in ecological or other biological data
  • Slide 8
  • Slide 9
  • Slide 10
  • Outliers
  • Slide 12
  • Uses of Multivariate Analysis
  • Some Matrices Used in Multivariate Analysis
  • Data Matrix
  • Covariance Matrix
  • Correlation Matrix
  • Multivariate distances between units or groups of units 1 Euclidean distance
  • Multivariate distances between units or groups of units 2 Penrose distance
  • Multivariate distances between units or groups of units 3 Mahalanobis distance
  • 3 species of iris 4 measurements
  • The Standard Data Matrix
  • The Association Matrix
  • Association matrices
  • Association matrices DissimilaritySimilarity
  • Association matrices SymmetricAsymmetric
Page 21: Introduction to  Multivariate Analysis and Multivariate Distances

3 species of iris 4 measurementsbull Euclidean distances

A 0B 32 0 C 48 16 0

A B C

bull Penrose distancesA 0B 28 0 C 39 15 0

A B C

bull Mahalanobis distancesA 0B 899 0 C 1794 172 0

A B C

CBA

SPECIES

The Standard Data Matrix

A B C D E F G hellip1234567

hellip

Variables

Units

The Association Matrix

A B C D E F G hellipABCDEFGhellip

Units

Units

Association matricesbull Social structure

ndash association between individualsbull Community ecology

ndash similarity between species sitesndash dissimilarities between species sites

bull Genetic distancesbull Correlation matricesbull Covariance matricesbull Distance matrices

ndash Euclidean Penrose Mahalanobis

SimilarityDissimilarity

Association matricesDissimilaritySimilarity

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Mahalanobis distances between iris species

A 0B 899 0 C 1794 172 0

A B C

Association matricesSymmetricAsymmetric

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

Grooming ratesof capuchinmonkeys(Perry 1996)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Recipient

Actor A S N D W T

A - 58 35 21 23 004

S 416 - 286 181 90 74

N 103 255 - 96 99 43

D 233 93 105 - 134 69

W 212 152 146 251 - 104

T 25 29 37 36 53 -

  • Introduction to Multivariate Analysis and Multivariate Distances
  • Slide 2
  • The Data Matrix
  • Slide 4
  • Visualize Data Matrix as Points in multidimensional space
  • Problems with Data Matrix
  • Missing Data Often present in ecological or other biological data
  • Slide 8
  • Slide 9
  • Slide 10
  • Outliers
  • Slide 12
  • Uses of Multivariate Analysis
  • Some Matrices Used in Multivariate Analysis
  • Data Matrix
  • Covariance Matrix
  • Correlation Matrix
  • Multivariate distances between units or groups of units 1 Euclidean distance
  • Multivariate distances between units or groups of units 2 Penrose distance
  • Multivariate distances between units or groups of units 3 Mahalanobis distance
  • 3 species of iris 4 measurements
  • The Standard Data Matrix
  • The Association Matrix
  • Association matrices
  • Association matrices DissimilaritySimilarity
  • Association matrices SymmetricAsymmetric
Page 22: Introduction to  Multivariate Analysis and Multivariate Distances

The Standard Data Matrix

A B C D E F G hellip1234567

hellip

Variables

Units

The Association Matrix

A B C D E F G hellipABCDEFGhellip

Units

Units

Association matricesbull Social structure

ndash association between individualsbull Community ecology

ndash similarity between species sitesndash dissimilarities between species sites

bull Genetic distancesbull Correlation matricesbull Covariance matricesbull Distance matrices

ndash Euclidean Penrose Mahalanobis

SimilarityDissimilarity

Association matricesDissimilaritySimilarity

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Mahalanobis distances between iris species

A 0B 899 0 C 1794 172 0

A B C

Association matricesSymmetricAsymmetric

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

Grooming ratesof capuchinmonkeys(Perry 1996)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Recipient

Actor A S N D W T

A - 58 35 21 23 004

S 416 - 286 181 90 74

N 103 255 - 96 99 43

D 233 93 105 - 134 69

W 212 152 146 251 - 104

T 25 29 37 36 53 -

  • Introduction to Multivariate Analysis and Multivariate Distances
  • Slide 2
  • The Data Matrix
  • Slide 4
  • Visualize Data Matrix as Points in multidimensional space
  • Problems with Data Matrix
  • Missing Data Often present in ecological or other biological data
  • Slide 8
  • Slide 9
  • Slide 10
  • Outliers
  • Slide 12
  • Uses of Multivariate Analysis
  • Some Matrices Used in Multivariate Analysis
  • Data Matrix
  • Covariance Matrix
  • Correlation Matrix
  • Multivariate distances between units or groups of units 1 Euclidean distance
  • Multivariate distances between units or groups of units 2 Penrose distance
  • Multivariate distances between units or groups of units 3 Mahalanobis distance
  • 3 species of iris 4 measurements
  • The Standard Data Matrix
  • The Association Matrix
  • Association matrices
  • Association matrices DissimilaritySimilarity
  • Association matrices SymmetricAsymmetric
Page 23: Introduction to  Multivariate Analysis and Multivariate Distances

The Association Matrix

A B C D E F G hellipABCDEFGhellip

Units

Units

Association matricesbull Social structure

ndash association between individualsbull Community ecology

ndash similarity between species sitesndash dissimilarities between species sites

bull Genetic distancesbull Correlation matricesbull Covariance matricesbull Distance matrices

ndash Euclidean Penrose Mahalanobis

SimilarityDissimilarity

Association matricesDissimilaritySimilarity

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Mahalanobis distances between iris species

A 0B 899 0 C 1794 172 0

A B C

Association matricesSymmetricAsymmetric

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

Grooming ratesof capuchinmonkeys(Perry 1996)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Recipient

Actor A S N D W T

A - 58 35 21 23 004

S 416 - 286 181 90 74

N 103 255 - 96 99 43

D 233 93 105 - 134 69

W 212 152 146 251 - 104

T 25 29 37 36 53 -

  • Introduction to Multivariate Analysis and Multivariate Distances
  • Slide 2
  • The Data Matrix
  • Slide 4
  • Visualize Data Matrix as Points in multidimensional space
  • Problems with Data Matrix
  • Missing Data Often present in ecological or other biological data
  • Slide 8
  • Slide 9
  • Slide 10
  • Outliers
  • Slide 12
  • Uses of Multivariate Analysis
  • Some Matrices Used in Multivariate Analysis
  • Data Matrix
  • Covariance Matrix
  • Correlation Matrix
  • Multivariate distances between units or groups of units 1 Euclidean distance
  • Multivariate distances between units or groups of units 2 Penrose distance
  • Multivariate distances between units or groups of units 3 Mahalanobis distance
  • 3 species of iris 4 measurements
  • The Standard Data Matrix
  • The Association Matrix
  • Association matrices
  • Association matrices DissimilaritySimilarity
  • Association matrices SymmetricAsymmetric
Page 24: Introduction to  Multivariate Analysis and Multivariate Distances

Association matricesbull Social structure

ndash association between individualsbull Community ecology

ndash similarity between species sitesndash dissimilarities between species sites

bull Genetic distancesbull Correlation matricesbull Covariance matricesbull Distance matrices

ndash Euclidean Penrose Mahalanobis

SimilarityDissimilarity

Association matricesDissimilaritySimilarity

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Mahalanobis distances between iris species

A 0B 899 0 C 1794 172 0

A B C

Association matricesSymmetricAsymmetric

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

Grooming ratesof capuchinmonkeys(Perry 1996)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Recipient

Actor A S N D W T

A - 58 35 21 23 004

S 416 - 286 181 90 74

N 103 255 - 96 99 43

D 233 93 105 - 134 69

W 212 152 146 251 - 104

T 25 29 37 36 53 -

  • Introduction to Multivariate Analysis and Multivariate Distances
  • Slide 2
  • The Data Matrix
  • Slide 4
  • Visualize Data Matrix as Points in multidimensional space
  • Problems with Data Matrix
  • Missing Data Often present in ecological or other biological data
  • Slide 8
  • Slide 9
  • Slide 10
  • Outliers
  • Slide 12
  • Uses of Multivariate Analysis
  • Some Matrices Used in Multivariate Analysis
  • Data Matrix
  • Covariance Matrix
  • Correlation Matrix
  • Multivariate distances between units or groups of units 1 Euclidean distance
  • Multivariate distances between units or groups of units 2 Penrose distance
  • Multivariate distances between units or groups of units 3 Mahalanobis distance
  • 3 species of iris 4 measurements
  • The Standard Data Matrix
  • The Association Matrix
  • Association matrices
  • Association matrices DissimilaritySimilarity
  • Association matrices SymmetricAsymmetric
Page 25: Introduction to  Multivariate Analysis and Multivariate Distances

Association matricesDissimilaritySimilarity

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Mahalanobis distances between iris species

A 0B 899 0 C 1794 172 0

A B C

Association matricesSymmetricAsymmetric

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

Grooming ratesof capuchinmonkeys(Perry 1996)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Recipient

Actor A S N D W T

A - 58 35 21 23 004

S 416 - 286 181 90 74

N 103 255 - 96 99 43

D 233 93 105 - 134 69

W 212 152 146 251 - 104

T 25 29 37 36 53 -

  • Introduction to Multivariate Analysis and Multivariate Distances
  • Slide 2
  • The Data Matrix
  • Slide 4
  • Visualize Data Matrix as Points in multidimensional space
  • Problems with Data Matrix
  • Missing Data Often present in ecological or other biological data
  • Slide 8
  • Slide 9
  • Slide 10
  • Outliers
  • Slide 12
  • Uses of Multivariate Analysis
  • Some Matrices Used in Multivariate Analysis
  • Data Matrix
  • Covariance Matrix
  • Correlation Matrix
  • Multivariate distances between units or groups of units 1 Euclidean distance
  • Multivariate distances between units or groups of units 2 Penrose distance
  • Multivariate distances between units or groups of units 3 Mahalanobis distance
  • 3 species of iris 4 measurements
  • The Standard Data Matrix
  • The Association Matrix
  • Association matrices
  • Association matrices DissimilaritySimilarity
  • Association matrices SymmetricAsymmetric
Page 26: Introduction to  Multivariate Analysis and Multivariate Distances

Association matricesSymmetricAsymmetric

Genetic relatedness among bottlenose dolphins (Krutzen et

al 2003)

Grooming ratesof capuchinmonkeys(Perry 1996)

GRI -024VAX 002 008KRI 002 -004 -019

MYR -027 044 -003 -011WOW 022 011 032 -010 010HOB -004 011 -017 -013 -008 -012WBE 015 007 -008 008 -008 023 013HOR -008 021 -014 -023 018 012 011 026AJA -024 023 -004 -016 -001 -016 007 025 032PIK -011 035 -007 004 002 -005 009 060 021 027ANV -005 -023 -039 -039 -021 -013 -041 011 011 002 -006VEE 014 002 015 -011 -008 000 -009 -005 006 001 -017 -017

LAT GRI VAX KRI MYR WOW HOB WBE HOR AJA PIK ANV

Recipient

Actor A S N D W T

A - 58 35 21 23 004

S 416 - 286 181 90 74

N 103 255 - 96 99 43

D 233 93 105 - 134 69

W 212 152 146 251 - 104

T 25 29 37 36 53 -

  • Introduction to Multivariate Analysis and Multivariate Distances
  • Slide 2
  • The Data Matrix
  • Slide 4
  • Visualize Data Matrix as Points in multidimensional space
  • Problems with Data Matrix
  • Missing Data Often present in ecological or other biological data
  • Slide 8
  • Slide 9
  • Slide 10
  • Outliers
  • Slide 12
  • Uses of Multivariate Analysis
  • Some Matrices Used in Multivariate Analysis
  • Data Matrix
  • Covariance Matrix
  • Correlation Matrix
  • Multivariate distances between units or groups of units 1 Euclidean distance
  • Multivariate distances between units or groups of units 2 Penrose distance
  • Multivariate distances between units or groups of units 3 Mahalanobis distance
  • 3 species of iris 4 measurements
  • The Standard Data Matrix
  • The Association Matrix
  • Association matrices
  • Association matrices DissimilaritySimilarity
  • Association matrices SymmetricAsymmetric