12
Multivariate Analysis of Variance (MANOVA) Find a significant difference between groups

Multivariate Analysis of Variance (MANOVA) · PDF fileMultivariate Analysis of Variance (MANOVA) Is there a significant difference among groups based on multiple response variables?

  • Upload
    ngohanh

  • View
    244

  • Download
    4

Embed Size (px)

Citation preview

Multivariate Analysis of Variance (MANOVA)

Find a significant difference between groups

C B A 𝑥 𝐴𝐵𝐶

𝐻𝑜: 𝜇𝐴 = 𝜇𝐵 = 𝜇𝐶 𝐻𝑎: 𝜇𝐴 ≠ 𝜇𝐵 ≠ 𝜇𝐶

The alternative could be true because all the means are different or just one of them is different than the others

If we reject the null hypothesis we need to perform some further analysis to draw conclusions about which population means differ from the others and by how much

𝑥 𝑐 𝑥 𝐵 𝑥 𝐴 508 514.25 727.5 583.25

Consider Univariate ANOVA Used when you have 3 or more samples

C B A 𝑥 𝐴𝐵𝐶

Used when you have 3 or more samples

𝐹 =𝑠𝑖𝑔𝑛𝑎𝑙

𝑛𝑜𝑖𝑠𝑒 𝐹 =

𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛

𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑤𝑖𝑡ℎ𝑖𝑛

𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 = 𝑥 𝑖 − 𝑥 𝐴𝐿𝐿

2𝑛𝑖

𝑛 − 1∗ 𝑟 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑤𝑖𝑡ℎ𝑖𝑛 =

𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒𝑖𝑛𝑖

𝑛

SIGNAL

NOISE

A large F-value indicates a significant difference

𝑥 𝑐 𝑥 𝐵 𝑥 𝐴 508 514.25 727.5 583.25

Consider Univariate ANOVA

C B A

𝑥 𝑐 𝑥 𝐵 𝑥 𝐴 508 514.25 727.5

𝑥 𝐴𝐵𝐶 SIGNAL

NOISE

𝐹 =𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛

𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑤𝑖𝑡ℎ𝑖𝑛=62463.25

672.1943= 𝟗𝟐. 𝟗𝟐𝟒𝟑𝟗

583.25

𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 = 𝑥 𝑖 − 𝑥 𝐴𝐵𝐶

2𝐴,𝐵,𝐶𝑖

3 − 1∗ 4 =

727.5 − 583.25 2 + 514.25 − 583.25 2 + 508 − 583.25 2

2∗ 4

𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 = 𝟔𝟐𝟒𝟔𝟑. 𝟐𝟓

𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑤𝑖𝑡ℎ𝑖𝑛 =𝑣𝑎𝑟𝐴 + 𝑣𝑎𝑟𝐵 + 𝑣𝑎𝑟𝐶

3=891.6667 + 819.3333 + 305.5833

3

𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑤𝑖𝑡ℎ𝑖𝑛 = 𝟔𝟕𝟐. 𝟏𝟗𝟒𝟑

One-way ANOVA in R:

anova(lm(YIELD~VARIETY))

Used when you have 3 or more samples Consider Univariate ANOVA

𝐹 =𝑠𝑖𝑔𝑛𝑎𝑙

𝑛𝑜𝑖𝑠𝑒 𝐹 =

𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛

𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑤𝑖𝑡ℎ𝑖𝑛

𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 = 𝑥 𝑖 − 𝑥 𝐴𝐿𝐿

2𝑛𝑖

𝑛 − 1∗ 𝑛𝑜𝑏𝑝𝑡

𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑤𝑖𝑡ℎ𝑖𝑛 = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒𝑖𝑛𝑖

𝑛

Pro

bab

ility

of

ob

serv

atio

n

𝑠𝑖𝑔𝑛𝑎𝑙 > 𝑛𝑜𝑖𝑠𝑒 𝑠𝑖𝑔𝑛𝑎𝑙 < 𝑛𝑜𝑖𝑠𝑒

P-value (percentiles, probabilities) Present 1-p-value

In R: pf(F, 𝑑𝑓1, 𝑑𝑓2)

In R: qf(p, 𝑑𝑓1, 𝑑𝑓2)

0.50 0 0.95

∝= 0.05

F-Distribution (family of distributions- shape is dependent on degrees of freedom)

The larger the F-value the further into the tail – AND the smaller the probability that the calculated F-value was found by chance, MEANING there is a high probability that something is causing a significant difference between the groups

Using DISCRIM to predict which group

Problem: A new skull is found but we don’t know whether it belongs to homo erectus or homo habilis or if it’s a new group?

Homo erectus

Homo habilis

Group centroid

New find (unknown origin)

Skull measurement

How predictions work:

1. Calculate group centroid 2. Find out which centroid is the closest position to the unknown data point

New groups are defined when we find a significant difference between new find and predefined groups

Popular method in taxonomy and anthropology

Multivariate Analysis of Variance (MANOVA)

Is there a significant difference among groups based on multiple response variables? (e.g. ANOVA with multiple response variables)

MANOVA in R: output=manova(responseMatrix~predictorMatrix) (stats package)

Skull measurement When we calculate a centroid of a group you build a probability distribution around the centroid for comparison

You can the run repeated t-tests (with adjusted p-values

for multiple comparisons) to compare the new data to the groups but MANOVA does it all for you in one shot!

Another lab on MANOVA for reference: Laura’s website, RENR 480, Lab 22

Assumptions of (MANOVA)

MANOVA is VERY sensitive to invalid assumptions and outliers

Within groups we need to have:

1. Normality: Residuals have to be normally distributed 2. Homogeneity of variances: residuals need to have equal variances

Need to meet the assumption in the univariate context to meet them for multivariate analyses

You therefore first have to check each individual measurement (response variable) for normality and homogeneity e.g. By making boxplots or plotting ANOVA residuals for each variable

Median

Mean

Left skewed negatively skewed

Normal perfectly symmetric

Right skewed positively skewed

Represented as a boxplot

Bi-Modal Two different modes

Not necessarily symmetric

Freq

uen

cy

Freq

uen

cy

Mode Mode

Mean Median

Assumptions of (MANOVA)

Generate boxplots for each response variable and assess shape & whiskers

Boxplots in R (multiple plots): boxplot(ResponseVariable~Group)

Testing for Normality & Equal Variances – Residual Plots

Residual plots in R (multiple plots): plot(lm(ResponseVariable~Group))(2nd plot) P

red

icte

d v

alu

es

Observed (original units)

Pre

dic

ted

val

ues

Observed (original units)

Pre

dic

ted

val

ues

Observed (original units)

Pre

dic

ted

val

ues

Observed (original units)

• NORMAL distribution: equal number of points along observed

• EQUAL variances: equal spread on either side of the meanpredicted value=0

• Good to go!

0

0

0

0

• NON-NORMAL distribution: unequal number of points along observed

• EQUAL variances: equal spread on either side of the meanpredicted value=0

• Optional to fix

• NORMAL/NON NORMAL: look at histogram or test

• UNEQUAL variances: cone shape – away from or towards zero

• This needs to be fixed for MANOVA (transformations)

• OUTLIERS: points that deviate from the majority of data points

• This needs to be fixed for MANOVA (transformations or removal)

Assumptions of (MANOVA)

Assumptions of (MANOVA)

If you violate the assumptions of MANOVA:

1. Transform your data (follow examples we will discuss on the board)

2. Use non-parametric options (e.g. perMANOVA Lab 6)

Multivariate Analysis of Variance (MANOVA) - output

You can see if there is a significant difference across all predictor variables using the Wilk’s MANOVA test statistic

Or you can see if there is a significant difference among groups for each predictor variable separately

P-value – the probability the observed difference between groups or larger is due

to random chance Thus if p-value is small this means that something is having an effect on the groups causing the difference