34
Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented by M. Carme Ruíz de Villa and Alex Sánchez Departament d’Estadística U.B.

Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Embed Size (px)

Citation preview

Page 1: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA

Xiangqin Cui and Gary A. ChurchillGenome Biology 2003, 4:210

Presented by M. Carme Ruíz de Villa and Alex

SánchezDepartament d’Estadística U.B.

Page 2: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Introduction

Page 3: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Remember …

We want to measure how gene expression changes under different conditions. Only two conditions and an adequate number

of replicates t-tests & extensions More than two conditions / more than one

factor: several approaches Analysis of Variance (ANOVA) (Churchill et al.) Linear Models (Smyth, Speed, …)

Page 4: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Sources of variation (1) We want to determine when the

variation due to gene expression is significant, but…

There are multiple sources of variation in measurements besides just gene expression.

We want to know when the variation in measurements is caused by varying levels of gene expression versus other factors.

Page 5: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Sources of variation (2) Some sources of variation in the

measurements in microarray experiments are: Array effects Dye effects Variety effects Gene effects Combinations

Page 6: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Relative expression values If more than two conditions

we cannot simply compute ratios ANOVA modelling yields estimates of the

relative expression for each gene in each sample

The ANOVA model is not based on log ratios. Rather it is applied directly to intensity data. However the difference between two relative expression values can be interpreted as the mean log ratio for comparing two samples.

Page 7: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Technical & biological replicates

If inference is being made on the basis of biological replicates

and there is also technical replication

technical replicates should be averaged to yield a single value for each independent biological unit.

Page 8: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Review of ANOVA models

Page 9: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

One way ANOVA Suppose you have a model for each

measurement in your experiment:

yij is jth measurement for ith group. μ : overall mean effect (constant) αi : ith group effect (constant) εij : experimental error term ~N(0,σ2)

Therefore, observations from group i are distributed with mean μ+ αi and variance σ2 .

ijiijy

Page 10: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Hypothesis Testing

)()( .. yyyyyy iiijij

Overall variability Within group variability

Between group variability

Intuition: if between group variability is large compared to within group variability then the differences between means is significant.

Page 11: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Sum of Squares

2

1 1

2.

1 1

2.

1 1

( )

( )

( )

i

i

i

nk

iji j

nk

ij ii j

nk

ii j

y y

y y

y y

Total sum of squares

Within Sum of Squares

Between Sum of Squares

Page 12: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Mean Sum of Squares

Between MS = Between SS/(k-1) Within MS = Within SS/(n-k) F = Between MS / Within SS It is summarized in the ANOVA

table Example 1

Page 13: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Multiple Factor ANOVA

ijk i j ij ijky

The model can be extended by adding more Factors (, , …) Interactions between them (, …) Other …

This is used to model the different sources of variation appearing in microarray experiments

Page 14: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Experiment 1: Latin Square

liver muscle muscleliver

Page 15: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Random effects models If the k factor levels can be considered a

random sample of a population of factors we have a random effect

ANOVA model: Yij= + Ai + eij, overall mean, Ai is a random variable instead of a

constanty, eij experimental error. E(Ai)=0, E(eij)=0, var(Ai)=A

2, var(eij) =2, Ai i eij independent var(Yij)= A

2 + 2.

Page 16: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Where to find more…

Draghici, S. (2003). ANOVA chapter (7) Data analysis tools for microarrays Wiley

Pavlidis, P. (2003) Using ANOVA for gene selection from microarray studies of the nervous systemhttp://microarray.cpmc.columbia.edu/pavlidis/ doc/reprints/anova-methods.pdf

Page 17: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

ANOVA Models for Microarray Data

Page 18: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Kerr & Churchill’s model

• yijkg expression measurement from the ith array, jth dye, kth variety, and gth gene.

• μ average expression over all spots.• Ai effect of the ith array. • Dj effect of the jth dye.• Vk effect of the kth variety (=treatment, sample, …)• Gg effect of the gth gene.• (AG)ig effect of the ith array and gth gene.• (VG)kg effect of the kth variety and gth gene.• Єijkg independent and identically distributed error terms.

Page 19: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Interpreting main effects A: differences in fluorescent signal from

array to array (e.g. if arrays are probed under inconsistent conditions that increase or reduce hybridization of labeled cDNA)

D: differences between two dye fluorescent labels (one dye may consistently be brighter than the other)

G: differences in fluorescence for equally expressed genes.

V: differences of expression level between different varieties (samples, tumour types,..).

Page 20: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Interpreting interactions DV: If for a particular variety labelling is

produced in separate runs of the process Differences in the runs can produce pools of cDNA of varying concentrations or quality.

AG: (Spot effect) Spots for a given gene on the different arrays vary in the amount of cDNA available for hybridization.

DG: if there are differences in the dyes that are gene-specific

VG: reflects differences in expression for particular variety and gene combinations that are not explained by the average effects of these varieties and genes.THIS IS THE QUANTITY OF INTEREST !!!

Page 21: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Normalization A,D,V terms effectively normalize the data,

thus the normalization process is integrated with the data analysis.

This approach has several benefits (?) The normalization is based on a clearly stated set

of assumptions It systematically estimates normalization

parameters based on all the data The model can be generalized to the situation

where genes are spotted multiple times on each array rather

Page 22: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Statistically Significant Effects

Array, Dye , Variety & Gene effect Goal: To estimate their value. Need not assess their significance Sometimes don’t appear (gene-level model)

Array x Gene, Variety x Gene effects May or not be present Goal: To assess their significance

Mean effect = 0 if fixed Effect variance = 0 if random

Page 23: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Test statistics: The 3 F’s Hypothesis testing involves

the comparison of two models.

In this setting we consider a null model of no differential

expression (all VG =0) and an alternative model with

differential expression among the conditions (some VG are not equal to zero).

F statistics are computed on a gene-by-gene basis based on the residual sums of squares from fitting each of these models.

11

1010

/

)/()(1

dfrss

dfdfrssrssF

21010 )/()(

3pool

dfdfrssrssF

2/)/(

)/()(2

211

1010

pooldfrss

dfdfrssrssF

Page 24: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Example 1 A gene, which is believed to be related to ovarian cancer is investigated The cancer is sub-classified in 3 cathegories (stages) I, II, III-IV 15 samples, 3 per stage are available They are labelled with 3 colors and hybridized on a 4 channel cDNA array (1

channel empty)(A seemingly more reasonable procedure: double dye-swap reference design)

Page 25: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Example 1. Normalized Data

Array I II III1 100 95 702 90 93 723 105 79 814 83 85 745 78 90 75

Page 26: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Example 1: ANOVA table (1)

ANOVA

EXPRLEVE

810.133 2 405.067 6.505 .012

747.200 12 62.267

1557.333 14

Inter-grupos

Intra-grupos

Total

Suma decuadrados gl

Mediacuadrática F Sig.

555N =

CANCTYP

321

EX

PR

LE

VE

110

100

90

80

70

60

13

If arrays are homogeneousThe appropriate model is 1 factor ANOVA

Page 27: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Example (1): Blocking

Origen de las variaciones

Suma de cuadrados

Grados de libertad

Promedio de los cuadrados F Probabilidad

Valor crítico para F

Filas 18.4 4 4.6 0.08440367 0.98276859 6.38823394Columnas 490 1 490 8.99082569 0.04000226 7.70864972Error 218 4 54.5

Total 726.4 9

If arrays are not homogeneous the appropriate model is 2 factor ANOVA (1 new block factor for arrays)

Page 28: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Example 2: CAMDA kidney dataftp://ftp.camda.duke.edu/CAMDA02_DATASETS/papers/README_normal.html

6 mouse kidney samples

(suppose 6 different treatments)

Compared to a common reference in a double reference design Dye swap Replicate arrays

2

Page 29: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

2.1. The ANOVA model

Work only at the gene level: no main effects (A, D, V, G) as defined

Yijk=DGi+AGj+VGk+ijk i=1,2 (dyes) j=1,2; (array) K=1,…,6 (sample)

Page 30: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

Example 3: A 2 factor design Diet X Strain

Page 31: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

3.2. Design

Page 32: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

3.3. The ANOVA model

Yijk=DGi+AGj+Strainl+Dietm+ Strain:Dietlm+ VGk+ijklm

i=1,…,2 (dyes) j=1,…,2; (array) k=1,…,12 (sample) l =1,…,3 (strain) m = 1,...,2 (diet)

Page 33: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

3.4 Sample R code (1) data(paigen) paigen <- createData(rawdata, 2) model.full.fix <- makeModel (data =

paigen,formula=~DG+AG+SG+ Strain+Diet+Strain:Diet)

anova.full.fix <-fitmaanova (paigen, model.full.fix)

model.noint.fix <- makeModel (data =

paigen,ormula=~DG+AG+SG+Strain+Diet) anova.noint.fix <- fitmaanova(paigen,

model.noint.fix)

Page 34: Statistical tests for differential expression in cDNA microarray experiments (2): ANOVA Xiangqin Cui and Gary A. Churchill Genome Biology 2003, 4:210 Presented

3.4. Sample R code (2) # permutation tests – # test for interaction effect test.int.fix <- ftest(paigen,

model.full.fix, model.noint.fix, n.perm=500)

idx.int.fix <- volcano(anova.full.fix, test.int.fix, title="Int. test")